Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Cyril Labbé is active.

Publication


Featured researches published by Cyril Labbé.


Journal of Quantitative Linguistics | 2001

Inter-Textual Distance and Authorship Attribution Corneille and Molière

Cyril Labbé; Dominique Labbé

The calculation proposed in this paper measures neighbourhood between several texts. It leads to a normalized metric and a distance scale which can be used for authorship attribution. An experiment is presented on one of the famous cases in French literature: Corneille and Molière. The calculation clearly makes the difference between the two works but it also demonstrates that Corneille contributed to many of Molière’s masterpieces.


Scientometrics | 2013

Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science?

Cyril Labbé; Dominique Labbé

Two kinds of bibliographic tools are used to retrieve scientific publications and make them available online. For one kind, access is free as they store information made publicly available online. For the other kind, access fees are required as they are compiled on information provided by the major publishers of scientific literature. The former can easily be interfered with, but it is generally assumed that the latter guarantee the integrity of the data they sell. Unfortunately, duplicate and fake publications are appearing in scientific conferences and, as a result, in the bibliographic services. We demonstrate a software method of detecting these duplicate and fake publications. Both the free services (such as Google Scholar and DBLP) and the charged-for services (such as IEEE Xplore) accept and index these publications.


Literary and Linguistic Computing | 2005

A Tool for Literary Studies: Intertextual Distance and Tree Classification

Cyril Labbé; Dominique Labbé

How to measure proximities and oppositions in large text corpora? Intertextual distance provides a simple and interesting solution. Its properties make it a good tool for text classification, and especially for tree-analysis which is fully presented and discussed here. In order to measure the quality of this classification, two indices are proposed. The method presented provides an accurate tool for literary studies -as is demonstrated by applying it to two areas of French literature, Racines tragedies and an authorship attribution experiment.


Journal of Quantitative Linguistics | 2004

Automatic Segmentation of Texts and Corpora

Cyril Labbé; Dominique Labbé; Pierre Hubert

Segmentation of large textual corpora is one of the major questions asked of literary studies. We present a combination of two relevant methods. First, vocabulary growth analysis highlights the main discontinuities in a work. Second, these results are supplemented with the analysis of variations in vocabulary diversity within corpora. A segmentation algorithm, associated with a test of validity, indicates the optimal succession in distinct stages. This method is applied to Racines works and various other works in French.


language resources and evaluation | 2005

How to Measure the Meanings of Words? Amour in Corneille’s Work

Cyril Labbé; Dominique Labbé

We present a new method to describe the contextual meaning of a key word in a corpus. The vocabulary of the sentences containing this word is compared to that of the entire corpus in order to highlight the words which are significantly overutilized in the neighbourhood of this key word (they are associated in the author’s mind) and the ones which are significantly underutilized (they are mutually exclusive). This method provides an interesting tool for lexicography and literary studies as is shown by applying it to the word amour (love) in the work of Pierre Corneille, the most famous French playwright of the 17th century.


Scientometrics | 2018

Detecting automatically generated sentences with grammatical structure similarity

Nguyen Minh Tien; Cyril Labbé

Abstract Automatically generated papers have been used to manipulate bibliography indexes on numerous occasions. This paper is interested in different means to generate texts such as recurrent neural network, Markov model, or probabilistic context free grammar, and if it is possible to detect them using a current approach. Then, probabilistic context free grammar (PCFG) is focused on as the one most used. However, even though there have been multiple approaches to detect such types of paper, they are all working at the document level and are unable to detect a small amount of generated text inside a larger body of genuinely written text. Thus, we present the grammatical structure similarity measurement to detect sentences or short fragments of automatically generated text from known PCFG generators. The proposed approach is tested against a pattern checker and various common machine learning methods. Additionally, the ability to detect a modified PCFG generator is also tested.


Scientometrics | 2017

Striking similarities between publications from China describing single gene knockdown experiments in human cancer cell lines

Jennifer A. Byrne; Cyril Labbé

Comparing 5 publications from China that described knockdowns of the human TPD52L2 gene in human cancer cell lines identified unexpected similarities between these publications, flaws in experimental design, and mis-matches between some described experiments and the reported results. Following communications with journal editors, two of these TPD52L2 publications have been retracted. One retraction notice stated that while the authors claimed that the data were original, the experiments had been out-sourced to a biotechnology company. Using search engine queries, automatic text-analysis, different similarity measures, and further visual inspection, we identified 48 examples of highly similar papers describing single gene knockdowns in 1–2 human cancer cell lines that were all published by investigators from China. The incorrect use of a particular TPD52L2 shRNA sequence as a negative or non-targeting control was identified in 30/48 (63%) of these publications, using a combination of Google Scholar searches and visual inspection. Overall, these results suggest that some publications describing the effects of single gene knockdowns in human cancer cell lines may include the results of experiments that were not performed by the authors. This has serious implications for the validity of such results, and for their application in future research.


Corpus | 2003

La distance intertextuelle

Cyril Labbé; Dominique Labbé


11th International Conference on Textual Data Statistical Analysis | 2012

Detection of Hidden Intertextuality in the Scientific Publications

Cyril Labbé; Dominique Labbé


Archive | 2008

Peut-on se fier aux arbres ?

Cyril Labbé; Dominique Labbé

Collaboration


Dive into the Cyril Labbé's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Denis Monière

Université de Montréal

View shared research outputs
Top Co-Authors

Avatar

Jennifer A. Byrne

Children's Hospital at Westmead

View shared research outputs
Researchain Logo
Decentralizing Knowledge