Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mathieu Roche is active.

Publication


Featured researches published by Mathieu Roche.


conference on soft computing as transdisciplinary science and technology | 2008

Web opinion mining: how to extract opinions from blogs?

Ali Harb; Michel Plantié; Gérard Dray; Mathieu Roche; François Trousset; Pascal Poncelet

The growing popularity of Web 2.0 provides with increasing numbers of documents expressing opinions on different topics. Recently, new research approaches have been defined in order to automatically extract such opinions from the Internet. They usually consider opinions to be expressed through adjectives, and make extensive use of either general dictionaries or experts to provide the relevant adjectives. Unfortunately, these approaches suffer from the following drawback: in a specific domain, a given adjective may either not exist or have a different meaning from another domain. In this paper, we propose a new approach focusing on two steps. First, we automatically extract a learning dataset for a specific domain from the Internet. Secondly, from this learning set we extract the set of positive and negative adjectives relevant to the domain. The usefulness of our approach was demonstrated by experiments performed on real data.


database and expert systems applications | 2011

Towards an on-line analysis of tweets processing

Sandra Bringay; Nicolas Béchet; Flavien Bouillot; Pascal Poncelet; Mathieu Roche; Maguelonne Teisseire

Tweets exchanged over the Internet represent an important source of information, even if their characteristics make them difficult to analyze (a maximum of 140 characters, etc.). In this paper, we define a data warehouse model to analyze large volumes of tweets by proposing measures relevant in the context of knowledge discovery. The use of data warehouses as a tool for the storage and analysis of textual documents is not new but current measures are not well-suited to the specificities of the manipulated data. We also propose a new way for extracting the context of a concept in a hierarchy. Experiments carried out on real data underline the relevance of our proposal.


Journal of Biomedical Informatics | 2011

Sequential patterns mining and gene sequence visualization to discover novelty from microarray data

Arnaud Sallaberry; Nicolas Pecheur; Sandra Bringay; Mathieu Roche; Maguelonne Teisseire

Data mining allow users to discover novelty in huge amounts of data. Frequent pattern methods have proved to be efficient, but the extracted patterns are often too numerous and thus difficult to analyze by end users. In this paper, we focus on sequential pattern mining and propose a new visualization system to help end users analyze the extracted knowledge and to highlight novelty according to databases of referenced biological documents. Our system is based on three visualization techniques: clouds, solar systems, and treemaps. We show that these techniques are very helpful for identifying associations and hierarchical relationships between patterns among related documents. Sequential patterns extracted from gene data using our system were successfully evaluated by two biology laboratories working on Alzheimers disease and cancer.


Contexts | 2007

AcroDef: a quality measure for discriminating expansions of ambiguous acronyms

Mathieu Roche; Violaine Prince

This paper presents a set of quality measures to determine the choice of the best expansion for an acronym not defined in the Web page. The method uses statistics computed on Web pages to determine the appropriate expansion. Measures are context-based and rely on the assumption that the most frequent words in the page are related semantically or lexically to the acronym expansion.


international syposium on methodologies for intelligent systems | 2009

Job Offer Management: How Improve the Ranking of Candidates

Rémy Kessler; Nicolas Béchet; Juan-Manuel Torres-Moreno; Mathieu Roche; Marc El-Bèze

The market of online job search sites grows exponentially. This implies volumes of information (mostly in the form of free text) become manually impossible to process. An analysis and assisted categorization seems relevant to address this issue. We present E-Gen, a system which aims to perform assisted analysis and categorization of job offers and of the responses of candidates. This paper presents several strategies based on vectorial and probabilistic models to solve the problem of profiling applications according to a specific job offer. Our objective is a system capable of reproducing the judgement of the recruitment consultant. We have evaluated a range of measures of similarity to rank candidatures by using ROC curves. Relevance feedback approach allows to surpass our previous results on this task, difficult, diverse and higly subjective.


data warehousing and knowledge discovery | 2008

Is a Voting Approach Accurate for Opinion Mining

Michel Plantié; Mathieu Roche; Gérard Dray; Pascal Poncelet

In this paper, we focus on classifying documents according to opinion and value judgment they contain. The main originality of our approach is to combine linguistic pre-processing, classification and a voting system using several classification methods. In this context, the relevant representation of the documents allows to determine the features for storing textual data in data warehouses. The conducted experiments on very large corpora from a French challenge on text mining (DEFT) show the efficiency of our approach.


database and expert systems applications | 2011

Towards an automatic characterization of criteria

Benjamin Duthil; François Trousset; Mathieu Roche; Gérard Dray; Michel Plantié; Jacky Montmain; Pascal Poncelet

The number of documents is growing exponentially with the rapid expansion of the Web. The new challenge for Internet users is now to rapidly find appropriate data to their requests. Thus information retrieval, automatic classification and detection of opinions appear as major issues in our information society. Many efficient tools have already been proposed to Internet users to ease their search over the web and support them in their choices. Nowadays, users would like genuine decision tools that would efficiently support them when focusing on relevant information according to specific criteria in their area of interest. In this paper, we propose a new approach for automatic characterization of such criteria. We bring out that this approach is able to automatically build a relevant lexicon for each criterion. We then show how this lexicon can be useful for documents classification or segmentation tasks. Experiments have been carried out with real datasets and show the efficiency of our proposal.


Archive | 2009

Information Retrieval in Biomedicine: Natural Language Processing for Knowledge Integration

Violaine Prince; Mathieu Roche

Today, there is an intense interest for bio natural language processing (NLP) creating a need among researchers, academicians, and practitioners for a comprehensive publication of articles in this area. Information Retrieval in Biomedicine: Natural Language Processing for Knowledge Integration provides relevant theoretical frameworks and the latest empirical research findings in this area according to a linguistic granularity. As a critical mass of advanced knowledge, this book presents original applications, going beyond existing publications while opening up the road for a broader use of NLP in biomedicine.


International Journal of Knowledge Discovery in Bioinformatics | 2014

Towards a Mixed Approach to Extract Biomedical Terms from Text Corpus

Juan Antonio Lossio Ventura; Clement Jonquet; Mathieu Roche; Maguelonne Teisseire

The objective of this paper is to present a methodology to extract and rank automatically biomedical terms from free text. The authors present new extraction methods taking into account linguistic patterns specialized for the biomedical domain, statistic term extraction measures such as C-value and statistic keyword extraction measures such as Okapi BM25, and TFIDF. These measures are combined in order to improve the extraction process and the authors investigate which combinations are the more relevant associated to different contexts. Experimental results show that an appropriate harmonic mean of C-value associated to keyword extraction measures offers better precision, both for single-word and multi-words term extraction. Experiments describe the extraction of English and French biomedical terms from a corpus of laboratory tests available online. The results are validated by using UMLS in English and only MeSH in French as reference dictionary.


OTM '08 Proceedings of the OTM Confederated International Workshops and Posters on On the Move to Meaningful Internet Systems: 2008 Workshops: ADI, AWeSoMe, COMBEK, EI2N, IWSSA, MONET, OnToContent + QSI, ORM, PerSys, RDDS, SEMELS, and SWWS | 2008

Automatic Profiling System for Ranking Candidates Answers in Human Resources

Rémy Kessler; Nicolas Béchet; Mathieu Roche; Marc El-Bèze; Juan-Manuel Torres-Moreno

The exponential growth of Internet allowed the development of a market of online job search sites. This work aims at presenting the E-Gen system (Automatic Job Offer Processing system for Human Resources). E-Gen will implement several complex tasks: an analysis and categorization of jobs offers which are unstructured text documents (e-mails of job offers possibly with an attached document), an analysis and a relevance ranking of the candidate answers. We present a strategy to resolve the last task: After a process of filtering and lemmatisation, we use vectorial representation and different similarity measures. The quality of ranking obtained is evaluated using ROC curves.

Collaboration


Dive into the Mathieu Roche's collaboration.

Top Co-Authors

Avatar

Pascal Poncelet

University of Montpellier

View shared research outputs
Top Co-Authors

Avatar

Maguelonne Teisseire

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Violaine Prince

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Cédric Lopez

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Nicolas Béchet

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Anne Laurent

University of Montpellier

View shared research outputs
Top Co-Authors

Avatar

Jérôme Azé

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Flavien Bouillot

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Hassan Saneifar

University of Montpellier

View shared research outputs
Top Co-Authors

Avatar

Renaud Lancelot

Institut national de la recherche agronomique

View shared research outputs
Researchain Logo
Decentralizing Knowledge