Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shu-Kai Hsieh is active.

Publication


Featured researches published by Shu-Kai Hsieh.


中文計算語言學期刊 | 2009

Assessing Text Readability Using Hierarchical Lexical Relations Retrieved from WordNet

Shu-Yen Lin; Cheng-Chao Su; Yu-Da Lai; Li-Chin Yang; Shu-Kai Hsieh

Although some traditional readability formulas have shown high predictive validity in the r=0.8 range and above (Chall & Dale, 1995), they are generally not based on genuine linguistic processing factors, but on statistical correlations (Crossley et al., 2008). Improvement of readability assessment should focus on finding variables that truly represent the comprehensibility of text as well as the indices that accurately measure the correlations. In this study, we explore the hierarchical relations between lexical items based on the conceptual categories advanced from Prototype Theory (Rosch et al., 1976). According to this theory and its development, basic level words like guitar represent the objects humans interact with most readily. They are acquired by children earlier than their superordinate words like stringed instrument and their subordinate words like acoustic guitar. Accordingly, the readability of a text is presumably associated with the ratio of basic level words it contains. WordNet (Fellbaum, 1998), a network of meaningfully related words, provides the best online open source database for studying such lexical relations. Our study shows that a basic level noun can be identified by its ratio of forming compounds (e.g. chair→armchair) and the length difference in relation to its hyponyms. We compared graded readings for American children and high school English readings for Taiwanese students by several readability formulas and in terms of basic level noun ratios (i.e. the number of basic level noun types divided by the number of noun types in a text). It is suggested that basic level noun ratios provide a robust and meaningful index of lexical complexity, which is directly associated with text readability.


language resources and evaluation | 2009

Exploring Interoperability of Language Resources: the Case of Cross-lingual Semi-automatic Enrichment of Wordnets

Claudia Soria; Monica Monachini; Francesca Bertagna; Nicoletta Calzolari; Chu-Ren Huang; Shu-Kai Hsieh; Andrea Marchetti; Maurizio Tesconi

In this paper we present an application fostering the integration and interoperability of computational lexicons, focusing on the particular case of mutual linking and cross-lingual enrichment of two wordnets, the ItalWordNet and Sinica BOW lexicons. This is intended as a case-study investigating the needs and requirements of semi-automatic integration and interoperability of lexical resources, in the view of developing a prototype web application to support the GlobalWordNet Grid initiative.


Proceedings of the 7th Workshop on Asian Language Resources | 2009

CWN-LMF: Chinese WordNet in the Lexical Markup Framework

Lung-Hao Lee; Shu-Kai Hsieh; Chu-Ren Huang

Lexical Markup Framework (LMF, ISO-24613) is the ISO standard which provides a common standardized framework for the construction of natural language processing lexicons. LMF facilitates data exchange among computational linguistic resources, and also promises a convenient uniformity for future application. This study describes the design and implementation of the WordNet-LMF used to represent lexical semantics in Chinese WordNet. The compiled CWN-LMF will be released to the community for linguistic researches.


Towards the Multilingual Semantic Web | 2014

A Multilingual Lexico-Semantic Database and Ontology

Francis Bond; Christiane Fellbaum; Shu-Kai Hsieh; Chu-Ren Huang; Adam Pease; Piek Vossen

We discuss the development of a multilingual lexicon linked to the Suggested Upper Merged Ontology (SUMO) formal ontology. The ontology as well as the lexicon have been expressed in Web Ontology Language (OWL), as well as their original formats, for use on the semantic web and in linked data. We describe the Open Multilingual Wordnet (OMW), a multilingual wordnet with 22 languages and a rich structure of semantic relations. It is made by exploiting links from various monolingual wordnets to the English Wordnet. Currently, it contains 118,337 concepts expressed in 1,643,260 senses in 22 languages. It is available as simple tab-separated files, Wordnet-Lexical Markup Framework (LMF) or lemon and had been used by many projects including BabelNet and Google Translate. We discuss some issues in extending the wordnets and improving the multilingual representation to cover concepts not lexicalized in English and how concepts are stated in the formal ontology.


Proceedings of the 7th Workshop on Asian Language Resources | 2009

Query Expansion using LMF-Compliant Lexical Resources

Takenobu Tokunaga; Dain Kaplan; Nicoletta Calzolari; Monica Monachini; Claudia Soria; Virach Sornlertlamvanich; Thatsanee Charoenporn; Yingju Xia; Chu-Ren Huang; Shu-Kai Hsieh; Kiyoaki Shirai

This paper reports prototype multilingual query expansion system relying on LMF compliant lexical resources. The system is one of the deliverables of a three-year project aiming at establishing an international standard for language resources which is applicable to Asian languages. Our important contributions to ISO 24613, standard Lexical Markup Framework (LMF) include its robustness to deal with Asian languages, and its applicability to cross-lingual query tasks, as illustrated by the prototype introduced in this paper.


IWIC'07 Proceedings of the 1st international conference on Intercultural collaboration | 2007

Fostering intercultural collaboration: a web service architecture for cross-fertilization of distributed wordnets

Francesca Bertagna; Monica Monachini; Claudia Soria; Nicoletta Calzolari; Chu-Ren Huang; Shu-Kai Hsieh; Andrea Marchetti; Maurizio Tesconi

Enhancing the development of multilingual lexicons is of foremost importance for intercultural collaboration to take place, as multilingual lexicons are the cornerstone of several multilingual applications. However, the development and maintenance of large-scale, robust multilingual dictionaries is a tantalizing task. In this paper we present a tool, based on a web service architecture, enabling semi-automatic generation of bilingual lexicons through linking of distributed monolingual lexical resources. In addition to lexicon development, the architecture also allows enrichment of monolingual source lexicons through exploitation of the semantic information encoded in corresponding entries. In the paper we describe our case study applied to the Italian and Chinese wordnets, and we illustrate how the architecture can be extended to access distributed multilingual WordNets over the Internet, paving the way to exploitation in a cross-lingual framework of the wealth of information built over the last decade.


meeting of the association for computational linguistics | 2006

When Conset Meets Synset: A Preliminary Survey of an Ontological Lexical Resource Based on Chinese Characters

Shu-Kai Hsieh; Chu-Ren Huang

This paper describes an on-going project concerning with an ontological lexical resource based on the abundant conceptual information grounded on Chinese characters. The ultimate goal of this project is set to construct a cognitively sound and computationally effective character-grounded machine-understandable resource. Philosophically, Chinese ideogram has its ontological status, but its applicability to the NLP task has not been expressed explicitly in terms of language resource. We thus propose the first attempt to locate Chinese characters within the context of ontology. Having the primary success in applying it to some NLP tasks, we believe that the construction of this knowledge resource will shed new light on theoretical setting as well as the construction of Chinese lexical semantic resources.


Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications | 2015

Linguistic Linked Data in Chinese: The Case of Chinese Wordnet

Chih-yao Lee; Shu-Kai Hsieh

The present study describes recent developments of Chinese Wordnet, which has been reformatted using the lemon model and published as part of the Linguistic Linked Open Data Cloud. While lemon suffices for modeling most of the structures in Chinese Wordnet at the lexical level, the model does not allow for finergrained distinction of a word sense, or meaning facets, a linguistic feature also attended to in Chinese Wordnet. As for the representation of synsets, we use the WordNet RDF ontology for integration’s sake. Also, we use another ontology proposed by the Global WordNet Association to show how Chinese Wordnet as Linked Data can be integrated into the Global WordNet Grid.


International Journal of Computer Processing of Languages | 2011

Sense Structure in Cube: Lexical Semantic Representation in Chinese Wordnet

Shu-Kai Hsieh

The representation of lexical semantic knowledge has been one of the most important research topics in the field of computational lexical semantics. Among relevant lexical resources, the design architecture of Princeton WordNet is the most popular one. In this paper, however, we argue that the current synset scheme requires more extensions when applied to the analysis of deeper sense structure in Chinese Wordnet. Issues involved include the underlying structure of sense, meaning facet and their relations. Based on a large amount of empirical analysis of sense data, this paper proposes a fine-grained framework in representing lexical semantic knowledge for Chinese Wordnet, which we believe will be an important consideration for the envisioned cross-lingual global wordnet grid construction. The systematic polysemy patterns found among meaning facets can also be used as a human gold standard of hand-annotated data for metonymy resolution task.


LKR'08 Proceedings of the 3rd international conference on Large-scale knowledge resources: construction and application | 2008

Design and prototype of a large-scale and fully sense-tagged corpus

Sue-jin Ker; Chu-Ren Huang; Jia-Fei Hong; Shi-Yin Liu; Hui-Ling Jian; I-Li Su; Shu-Kai Hsieh

Sense tagged corpus plays a very crucial role to Natural Language Processing, especially on the research of word sense disambiguation and natural language understanding. Having a large-scale Chinese sense tagged corpus seems to be very essential, but in fact, such large-scale corpus is the critical deficiency at the current stage. This paper is aimed to design a large-scale Chinese full text sense tagged Corpus, which contains over 110,000 words. The Academia Sinica Balanced Corpus of Modern Chinese (also named Sinica Corpus) is treated as the tagging object, and there are 56 full texts extracted from this corpus. By using the N-gram statistics and the information of collocation, the preparation work for automatic sense tagging is planned by combining the techniques and methods of machine learning and the probability model. In order to achieve a highly precise result, the result of automatic sense tagging needs the touch of manual revising.

Collaboration


Dive into the Shu-Kai Hsieh's collaboration.

Top Co-Authors

Avatar

Chu-Ren Huang

Hong Kong Polytechnic University

View shared research outputs
Top Co-Authors

Avatar

Yu-Yun Chang

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Piek Vossen

VU University Amsterdam

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kiyoaki Shirai

Japan Advanced Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Takenobu Tokunaga

Tokyo Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge