Shiwen Yu
Peking University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shiwen Yu.
workshop on chinese lexical semantics | 2015
Lei Wang; Shiwen Yu; Zhimin Wang; Weiguang Qu; Houfeng Wang
Idioms are not only interesting but also distinctive in a language for its continuity and metaphorical meaning in its context. This paper introduces the construction of a Chinese idiom knowledge base by the Institute of Computational Linguistics at Peking University and describes an experiment that aims at the automatic emotion classification of Chinese idioms. In the process, we expect to know more about how the constituents in a fossilized composition like an idiom function so as to affect its emotional properties.
workshop on chinese lexical semantics | 2014
Yuxiang Jia; Hongying Zan; Ming Fan; Shiwen Yu; Zhimin Wang
Metaphor in languages is an analogy-based meaning transfer phenomenon that impacts on question answering, machine translation and other tasks requiring deep semantic analysis. Noun-noun metaphor is a common type of metaphor and is studied in noun-noun semantic analysis. Lexical knowledge bases are important knowledge source for metaphor recognition and understanding. This paper proposes a method to compute word relevance for noun-noun metaphor recognition with the help of a lexical knowledge base, which shows good performance in the experiments. Furthermore, we investigate the representation of metaphor in lexical knowledge bases and its impact on the construction of lexical knowledge bases.
workshop on chinese lexical semantics | 2014
Lei Wang; Shiwen Yu; Zhimin Wang; Weiguang Qu; Houfeng Wang
In Chinese language, idioms are an essential part of its vocabulary and used in everyday expression. People like to use idioms for their power of expression, rhetoric skill and special effect, which are mainly created by the metaphors in most of the idioms. This paper introduces a tentative research on the idioms with metaphors based on the Chinese Idiom Knowledge Base(CIKB) by the Institute of Computational Linguistics at Peking University (ICL/PKU), in which the author expects to provide due help to research and applications on this topic. We believe that research as such will have benefit on NLP tasks like automatic metaphor recognition and processing, semantic role labeling etc. On the other hand, our work may also contribute to lexicography, Chinese linguistics study and teaching Chinese as a foreign language.
workshop on chinese lexical semantics | 2013
Lei Wang; Shujing Li; Weiguang Qu; Shiwen Yu
In a language, Multi-word Expressions (MWEs, also called “idiomatic expressions” or “set phrases”) are very common in everyday usage. Most linguists hold that MWEs be an inclusive concept that should consist of not only lexical units such as idioms, idiomatic expressions, xiehouyu, proper nouns, but also non-lexical units such as proverbs, maxims and adages. Even those that are statistically idiosyncratic are to be listed in MWEs. In NLP tasks like word segmentation and semantic role labeling remain a bottle-neck problem. Therefore, to construct a knowledge base for MWEs with relatively complete entries and tagged attributes will be an effective solution for the above-mentioned problem. This paper introduces relevant information about the construction and application of an MWE knowledge base by the Institute of Computational Linguistics at Peking University(ICL/PKU), in which the author expects to provide due help to research in this regard.
workshop on chinese lexical semantics | 2017
Lei Wang; Shiwen Yu; Houfeng Wang
From Morrison to Pearl S. Buck, Chinese language has been introducing new words from western languages – English as a typical source – for over a hundred years. Generally speaking, this new vocabulary is termed as loan words, which can be traced to two major sources: 1. Introduced by western missionaries having worked in China; 2. Introduced by Chinese intellectuals via Japanese “ Open image in new window (he zhi han ci)” that were originally translated from western literature in the early 1900s. From the perspective of Chinese-English equivalence, these new words in Chinese form a one-to-one relation with their English source words for they were directly or indirectly translated from English. Therefore they may translate it into some other expressions. Currently, dictionaries of Chinese loan words serve as the vehicles of this new type of vocabulary, but they have only paper-versions and limited number of entries, which lag behind the fast development of information technology and the growing need of instant acquirement of knowledge. Therefore, to compile a new lexicon for Chinese loan words that have one-to-one correspondence with English will help translators work with a better quality and efficiency.
IOP Conference Series: Materials Science and Engineering | 2017
Lei Wang; Houfeng Wang; Shiwen Yu
As the proposition of the next-generation Web – semantic Web, semantic computing has been drawing more and more attention within the circle and the industries. A lot of research has been conducted on the theory and methodology of the subject, and potential applications have also been investigated and proposed in many fields. The progress of semantic computing made so far cannot be detached from its supporting pivot – language resources, for instance, language knowledge bases. This paper proposes three perspectives of semantic computing from a macro view and describes the current status of affairs about the construction of language knowledge bases and the related research and applications that have been carried out on the basis of these resources via a case study in the Institute of Computational Linguistics at Peking University.
workshop on chinese lexical semantics | 2016
Zhimin Wang; Lei Wang; Shiwen Yu
This paper explores the structural characteristics of idioms with “如(Ru)”, focuses on the similarities and differences between the format “1+Ru+2” and “2+Ru+1”, and summarizes the selection restriction and metaphorical mapping between tenor and vehicle, through the analysis of word “Ru” in different position. This study shows that although the word “Ru” is in the idiom of the four kinds of positions, among which the number of the third positions is the largest. The “Ru” idioms have different mapping regularities, that is, the tenor is not the abstract and unfamiliar things, but the choice of human body, the body parts. The concept of the five elements such as “gold, wood, water, fire, earth” usually is selected as sources in the format “2+Ru+1”,as well as the living things familiar with the ancient ancestors as a metaphor.
international conference on asian language processing | 2016
Lei Wang; Weiguang Qu; Houfeng Wang; Shiwen Yu
Influenced by the grammatical system of western languages, there are more and more syntactic structures in modern Chinese that can be translated into English attributive clauses. But for the great differences of the syntactic structures, parsing Chinese by western grammatical rules usually does not lead to satisfactory results, which will result in poor translation performance in complex syntactic structures. This paper first attempts to recognize the attributive clauses by using conditional random fields (“CRFs”) theory by selecting representative features combined with grammatical information unique in Chinese language. The experiment proves that this method will produce a better result than simply by statistical machine translation method.
workshop on chinese lexical semantics | 2015
Likun Qiu; Hongying Zan; Xuefeng Zhu; Shiwen Yu
Having been debated and studied for more than one century, the part-of-speech classifications of contemporary Chinese words is still attracting considerable attention from many linguists today. In this study, we aim to compare the classification systems of two lexicons C Dictionary of Contemporary Chinese (Fifth Edition) and Grammatical Knowledge-Base of Contemporary Chinese, and to observe the similarities and differences between their part-of-speech classifications of words in a comprehensive way. This paper discusses our preliminary observations, especially on the comparison of prepositions in the two lexicons. We expect that this type of contrastive studies will contribute to a deeper understanding of the parts-of-speech in Chinese, especially to the part-of-speech classification of certain Chinese words, which has long been debated.
workshop on chinese lexical semantics | 2013
Yanqiu Shao; Shiwen Yu; Chunxia Liang; Ning Mao
A multilingual computational linguistics dictionary involving English, Chinese, Japanese, German was built by Institute of Computational Linguistics of Peking University in the 1990s. The dictionary contains more than 5,400 terms of computational linguistics and it made great contributions to the development of NLP domain. In order to develop the prior achievements, more terms that occur in the past two decades are added into the expanded term bank (ETB) which includes about 13,000 English terms and the number of languages involved is also extended to seven. Now,the seven language core term bank is mostly done. The construction of ETB including the scale, source of terms and the design of the database management system is described in details in the paper. ETB will have a promoting effect on the development of computational linguistics.