Guillaume Cabanac
University of Toulouse
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Guillaume Cabanac.
Scientometrics | 2011
Guillaume Cabanac
Scientific literature recommender systems (SLRSs) provide papers to researchers according to their scientific interests. Systems rely on inter-researcher similarity measures that are usually computed according to publication contents (i.e., by extracting paper topics and citations). We highlight two major issues related to this design. The required full-text access and processing are expensive and hardly feasible. Moreover, clues about meetings, encounters, and informal exchanges between researchers (which are related to a social dimension) were not exploited to date. In order to tackle these issues, we propose an original SLRS based on a threefold contribution. First, we argue the case for defining inter-researcher similarity measures building on publicly available metadata. Second, we define topical and social measures that we combine together to issue socio-topical recommendations. Third, we conduct an evaluation with 71 volunteer researchers to check researchers’ perception against socio-topical similarities. Experimental results show a significant 11.21% accuracy improvement of socio-topical recommendations compared to baseline topical recommendations.
Social Network Analysis and Mining | 2013
Malik Muhammad Saad Missen; Mohand Boughanem; Guillaume Cabanac
Opinion mining is one of the most challenging tasks of the field of information retrieval. Research community has been publishing a number of articles on this topic but a significant increase in interest has been observed during the past decade especially after the launch of several online social networks. In this paper, we provide a very detailed overview of the related work of opinion mining. Following features of our review make it stand unique among the works of similar kind: (1) it presents a very different perspective of the opinion mining field by discussing the work on different granularity levels (like word, sentences, and document levels) which is very unique and much required, (2) discussion of the related work in terms of challenges of the field of opinion mining, (3) document level discussion of the related work gives an overview of opinion mining task in blogosphere, one of most popular online social network, and (4) highlights the importance of online social networks for opinion mining task and other related sub-tasks.
Journal of the Association for Information Science and Technology | 2012
Guillaume Cabanac
Characteristics of the Journal of the American Society for Information Science and Technology and 76 other journals listed in the Information Systems category of the Journal Citation Reports–Science edition 2009 were analyzed. Besides reporting usual bibliographic indicators, we investigated the human cornerstone of any peer-reviewed journal: its editorial board. Demographic data about the 2,846 gatekeepers serving in information systems (IS) editorial boards were collected. We discuss various scientometric indicators supported by descriptive statistics. Our findings reflect the great variety of IS journals in terms of research output, author communities, editorial boards, and gatekeeper demographics (e.g., diversity in gender and location), seniority, authority, and degree of involvement in editorial boards. We believe that these results may help the general public and scholars (e.g., readers, authors, journal gatekeepers, policy makers) to revise and increase their knowledge of scholarly communication in the IS field. The EB_IS_2009 dataset supporting this scientometric study is released as online supplementary material to this article to foster further research on editorial boards.
annual acis international conference on computer and information science | 2009
Malik Muhammad Saad Missen; Mohand Boughanem; Guillaume Cabanac
The Opinion Detection from blogs has always been a challenge for researchers. However with the introduction of Blog track in TREC 2006, a considerable improvement has been seen in this field at document level. But now it is the time when researchers are thinking to shift their orientation from opinion finding at document level to opinion finding at sentence or passage level. In this paper, we investigate the challenges the researchers might face with sentence-level opinion detection and have tried to demonstrate them with few examples. Our work also includes annotation of a small set of opinionated sentences by two annotators. These Annotators annotate the sentences by labels Positive or Negative. The results of annotation prove that task of opinion detection on sentence-level is more challenging task than opinion detection on document level. In addition, we also discuss the importance of sentence-level opinion detection. Our work can give a new direction to researchers to think and work on.
data warehousing and knowledge discovery | 2007
Guillaume Cabanac; Max Chevalier; Franck Ravat; Olivier Teste
This paper deals with an annotation-based decisional system. The decisional system we present is based on multidimensional databases, which are composed of facts and dimensions. The expertise of decision-makers is modelled, shared and stored through annotations. These annotations allow decisionmakers to make an active reading and to collaborate with other decisionmakers about a common analysis project.
acm symposium on applied computing | 2013
Firas Damak; Karen Pinel-Sauvagnat; Mohand Boughanem; Guillaume Cabanac
We investigate in this paper information retrieval in microblogs exploiting different state-of-the-art features. Microbloggers, besides posting microblogs, search for fresh and relevant information related to their interests, by submitting a query to a microblog search engine. The majority of approaches that collect information from microblogs exploit features such as the recency of the microblog, the authority of his/her author... to improve the quality of their results. In this paper, we evaluated some of the state-of-the-art features to determine those that discriminate relevant from irrelevant microblogs given an information need. Then, we used the selected features to learn models to determine their effectiveness in a microblog search task. We conducted a series of experiments using the dataset and topics of the TREC Microblog 2011 and 2012 tracks. Results show that content, hypertextuality, and recency are the best predictors of relevance. We also found that Naive Bayes was the most effective learning approach for this type of classification.
International Journal on Digital Libraries | 2010
Damien Palacio; Guillaume Cabanac; Christian Sallaberry; Gilles Hubert
Search engines for Digital Libraries allow users to retrieve documents according to their contents. They process documents without differentiating the manifold aspects of information. Spatial and temporal dimensions are particularly dismissed. These dimensions are, however, of great interest for users of search engines targeting either the Web or specialized Digital Libraries. Recent studies reported that nearly 20% queries convey spatial and temporal information in addition to topical information. These three dimensions were referred to as parts of “geographic information.” In the literature, search engines handling those dimensions are called “Geographic Information Retrieval (GIR) systems.” Although several initiatives for evaluating GIR systems were undertaken, none was concerned with evaluating these three dimensions altogether. In this article, we address this issue by designing an evaluation framework, usefulness of which is highlighted through a case study involving a test collection and a GIR system. This framework allowed the comparison of our GIR system to state-of-the-art topical approaches. We also performed experiments for measuring performance improvement stemming from each dimension or their combination. We show that combining the three dimensions yields improvement in effectiveness (+73.9%) over a common topical baseline. Moreover, rather than conveying redundancy, the three dimensions complement each other.
cross language evaluation forum | 2010
Guillaume Cabanac; Gilles Hubert; Mohand Boughanem; Claude Chrisment
We consider Information Retrieval evaluation, especially at TREC with the trec_eval program. It appears that systems obtain scores regarding not only the relevance of retrieved documents, but also according to document names in case of ties (i.e., when they are retrieved with the same score). We consider this tie-breaking strategy as an uncontrolled parameter influencing measure scores, and argue the case for fairer tie-breaking strategies. A study of 22 TREC editions reveals significant differences between the Conventional unfair TRECs strategy and the fairer strategies we propose. This experimental result advocates using these fairer strategies when conducting evaluations.
database and expert systems applications | 2007
Guillaume Cabanac; Max Chevalier; Claude Chrisment; Christine Julien
Nowadays, organizational members manage the huge amount of digital documents that they exploit at work. To do that, they organize documents into individual hierarchies. Actually, these documents are really parts of a companys capital as they reflect past experiences, present competences and impending expertise. Unfortunately, even if corporate documents represent high value-added material, they still mostly remain unknown from the organization as a whole. That is the reason why this paper proposes to build a unified view of corporate documents. Our approach is complementary to current content-based ones because it relies on an original metrics related to documents usage within an organization.
association for information science and technology | 2016
Guillaume Cabanac
Research articles disseminate the knowledge produced by the scientific community. Access to this literature is crucial for researchers and the general public. Apparently, “bibliogifts” are available online for free from text‐sharing platforms. However, little is known about such platforms. What is the size of the underlying digital libraries? What are the topics covered? Where do these documents originally come from? This article reports on a study of the Library Genesis platform (LibGen). The 25 million documents (42 terabytes) it hosts and distributes for free are mostly research articles, textbooks, and books in English. The article collection stems from isolated, but massive, article uploads (71%) in line with a “biblioleaks” scenario, as well as from daily crowdsourcing (29%) by worldwide users of platforms such as Reddit Scholar and Sci‐Hub. By relating the DOIs registered at CrossRef and those cached at LibGen, this study reveals that 36% of all DOI articles are available for free at LibGen. This figure is even higher (68%) for three major publishers: Elsevier, Springer, and Wiley. More research is needed to understand to what extent researchers and the general public have recourse to such text‐sharing platforms and why.