Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yves Bestgen is active.

Publication


Featured researches published by Yves Bestgen.


Cognition & Emotion | 1994

Can emotional valence in stories be determined from words

Yves Bestgen

In spite of the growing interest witnessed in the study of the relationship between emotion and language, the determination of the emotional valence of sentences, paragraphs or texts has so far attracted little attention. To bridge this gap, a technique based on the emotional aspect of words is presented. In this preliminary study, we have compared the affective tones of the sentences of four texts as perceived by readers, to the values generated by the words that compose the texts. The results support the psychological reality of the affective tones of linguistic units larger than a word, and the possibility of their evaluation through the lexical information. Such information should be useful for studying the role of emotional interest on text processing and for the analysis of the natural stories produced by people in reaction to stressful events.


Behavior Research Methods | 2012

Checking and bootstrapping lexical norms by means of word similarity indexes

Yves Bestgen; Nadja Vincze

In psychology, lexical norms related to the semantic properties of words, such as concreteness and valence, are important research resources. Collecting such norms by asking judges to rate the words is very time consuming, which strongly limits the number of words that compose them. In the present article, we present a technique for estimating lexical norms based on the latent semantic analysis of a corpus. The analyses conducted emphasize the technique’s effectiveness for several semantic dimensions. In addition to the extension of norms, this technique can be used to check human ratings to identify words for which the rating is very different from the corpus-based estimate.


Iral-international Review of Applied Linguistics in Language Teaching | 2014

The use of collocations by intermediate vs. advanced non-native writers: A bigram-based study.

Sylviane Granger; Yves Bestgen

Abstract Phraseological competence, the use of (semi-)prefabricated expressions in language, is a major component of second language acquisition. Recent research focused on lexical bundles, i.e. recurrent contiguous strings of words, has highlighted quantitative and qualitative differences between native and non-native speaker use of these strings. Few studies, however, have investigated the development of phraseological competence as a function of degree of proficiency in L2. Relying on a methodology used by Durrant and Schmitt (2009: IRAL 47, 157–177) to compare native and non-native speakers, the present study identifies significant differences in the way in which intermediate and advanced learners use collocations. In particular, the intermediate learners tend to overuse high frequency collocations (such as hard work) and underuse lower-frequency, but strongly associated, collocations (such as immortal souls). The concluding section addresses the limits of the study and points to possible applications in foreign language teaching and automated scoring.


Journal of Pragmatics | 1998

Segmentation markers as trace and signal of discourse structure

Yves Bestgen

This paper focuses on the functions of segmentation markers, such as punctuation, pauses, connectives, and referential expressions. They highlight continuity and discontinuity in discourse. These markers can be signals to improve comprehension, but also traces of production difficulties that occur when a new topic is introduced. Data are presented to support this double role of signal and trace. We focus on the connective and. Making production more difficult increases the proportion of and just before a topic shift. When used as a signal of high continuity, this connective is not affected by a manipulation of production difficulty.


Discourse Processes | 2006

Toward Automatic Determination of the Semantics of Connectives in Large Newspaper Corpora.

Yves Bestgen; Liesbeth Degand; Wilbert Spooren

We explored the possibility of using automatic techniques to analyze the use of backward causal connectives in large Dutch newspaper corpora. With the help of 2 techniques, Latent Semantic Analysis and Thematic Text Analysis, the contexts of more than 14,000 connectives were studied. The method of analysis is described. We found that differences that have been suggested in the literature via hand-based analyses between these types of connectives (e.g., on dimensions such as subjectivity, change in perspective, and factuality of the connected segments) also appear in our corpus of 16.5 million words.


International journal of continuing engineering education and life-long learning | 2011

Categorising spelling errors to assess L2 writing

Yves Bestgen; Sylviane Granger

Based on a corpus of 223 argumentative essays written by English as a foreign language learners, this study shows that spelling errors, whether detected manually or automatically, are a reliable predictor of the quality of L2 texts and that reliability is further improved by sub-categorising errors. However, the benefit derived from sub-categorisation is much lower in the case of errors automatically detected by means of the Microsoft Word 2007 spell checker, a situation which results from Words limited success in detecting and correcting some specific categories of L2 learner errors.


Literary and Linguistic Computing | 2003

Towards Automatic Retrieval of Idioms in French Newspaper Corpora

Liesbeth Degand; Yves Bestgen

The goal of this paper is to present a procedure for the automatic retrieval of idiomatic expressions from large text corpora. The procedure combines text segmentation techniques and Latent semantic analysis. Three indices were computed on the basis of the three-fold hypothesis that: (1) idiomatic expressions should have few neighbours; (2) idiomatic expressions should demonstrate low semantic proximity between the words composing them; (3) idiomatic expressions should demonstrate low semantic proximity between the expression and the preceding and subsequent segments. The result of this procedure shows that we have not yet reached a fully automatic retrieval of idioms from large corpora, but this first trial has shown that we are on the way. The procedure reduces the amount of data to consider to less than a quarter (23.8 per cent) of the original data, of which one-fifth (20.9 per cent) is idiomatic, and nearly 60 per cent (58.8 per cent) is phraseological in nature. In other words, this procedure drastically improves and facilitates hand-based retrieval. In addition, these first results already permit some linguistic exploitation of the retrieved idioms.


Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial) | 2017

Improving the character ngram model for the DSL task with BM25 weighting and less frequently used feature sets.

Yves Bestgen

This paper describes the system developed by the Centre for English Corpus Linguistics (CECL) to discriminating similar languages, language varieties and dialects. Based on a SVM with character and POStag n-grams as features and the BM25 weighting scheme, it achieved 92.7% accuracy in the Discriminating between Similar Languages (DSL) task, ranking first among eleven systems but with a lead over the next three teams of only 0.2%. A simpler version of the system ranked second in the German Dialect Identification (GDI) task thanks to several ad hoc postprocessing steps. Complementary analyses carried out by a cross-validation procedure suggest that the BM25 weighting scheme could be competitive in this type of tasks, at least in comparison with the sublinear TF-IDF. POStag n-grams also improved the system performance.


Literary and Linguistic Computing | 2014

Inadequacy of the chi-squared test to examine vocabulary differences between corpora

Yves Bestgen

Pearsons chi-squared test is probably the most popular statistical test used in corpus linguistics, particularly for studying linguistic variations between corpora. Oakes and Farrow (Literary and Linguistic Computing, 2007, 22, 85-99) proposed various adaptations of this test in order to allow for the simultaneous comparison of more than two corpora, while also yielding an almost correct Type I error rate (i.e. claiming that a word is most frequently found in a variety of English, when in actuality this is not the case). By means of resampling procedures, the present study shows that when used in this context, the chi-squared test produces far too many significant results, even in its modified version. Several potential approaches to circumventing this problem are discussed in the conclusion.


Empirical Studies of The Arts | 1989

On the Thread of Discourse: Homogeneity, Trends, and Rhythms in Texts:

Robert Hogenraad; Yves Bestgen

The rich variety of literary material can usefully be described by quantitative content analysis. Usually, such a description proceeds by segmenting the text into large aggregates that take no account of original word order. A more fine-grained analysis can be obtained by taking word order into account. The latter analysis is closer to the linear nature of the text as narrative and also closer to the true nature of language itself. PROTAN, a computer-aided content analysis system, takes care of all the operations that result in the tagging of text words into an appropriate category (here called a dictionary); meanwhile, the original sequential order of the tagged words is kept unchanged within the text. Trend analyses and time-series analyses can then be performed on the condition that pertinent categories can be shown to be non-randomly distributed throughout the text (non-homogeneity). The corpus reported upon in this article is made up of two sets of texts. The first set consists of seven reference texts—mostly short stories—to serve as foil for a second set of eighteen texts written and distributed by a Belgian terrorist group during 1984 and 1985. The results point first to the psychological significance of whether texts are homogeneous or not on a given dictionary. They point secondly to the pertinence of content rhythms for describing texts. Compared to the reference texts, the eighteen target texts are more orderly in more ways. This is in opposition to the opinion generally held by public authorities.

Collaboration


Dive into the Yves Bestgen's collaboration.

Top Co-Authors

Avatar

Liesbeth Degand

Université catholique de Louvain

View shared research outputs
Top Co-Authors

Avatar

Sylviane Granger

Université catholique de Louvain

View shared research outputs
Top Co-Authors

Avatar

Sophie Piérard

Université catholique de Louvain

View shared research outputs
Top Co-Authors

Avatar

Wilbert Spooren

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar

Robert Hogenraad

Catholic University of Leuven

View shared research outputs
Top Co-Authors

Avatar

Vincent Dupont

Catholic University of Leuven

View shared research outputs
Top Co-Authors

Avatar

Sophie Piérard

Université catholique de Louvain

View shared research outputs
Top Co-Authors

Avatar

Nadja Vincze

Université catholique de Louvain

View shared research outputs
Top Co-Authors

Avatar

Jennifer Thewissen

Université catholique de Louvain

View shared research outputs
Top Co-Authors

Avatar

Guy Lories

Université catholique de Louvain

View shared research outputs
Researchain Logo
Decentralizing Knowledge