Karen Pinel-Sauvagnat
University of Toulouse
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Karen Pinel-Sauvagnat.
conference on information and knowledge management | 2011
Arlind Kopliku; Mohand Boughanem; Karen Pinel-Sauvagnat
In this paper, we propose an attribute retrieval approach which extracts and ranks attributes from HTML tables. We distinguish between class attribute retrieval and instance attribute retrieval. On one hand, given an instance (e.g. University of Strathclyde) we retrieve from the Web its attributes (e.g. principal, location, number of students). On the other hand, given a class (e.g. universities) represented by a set of instances, we retrieve common attributes of its instances. Furthermore, we show we can reinforce instance attribute retrieval if similar instances are available. Our approach uses HTML tables which are probably the largest source for attribute retrieval. Three recall oriented filters are applied over tables to check the following three properties: (i) is the table relational, (ii) has the table a header, and (iii) the conformity of its attributes and values. Candidate attributes are extracted from tables and ranked with a combination of relevance features. Our approach is shown to have a high recall and a reasonable precision. Moreover, it outperforms state of the art techniques.
acm symposium on applied computing | 2013
Firas Damak; Karen Pinel-Sauvagnat; Mohand Boughanem; Guillaume Cabanac
We investigate in this paper information retrieval in microblogs exploiting different state-of-the-art features. Microbloggers, besides posting microblogs, search for fresh and relevant information related to their interests, by submitting a query to a microblog search engine. The majority of approaches that collect information from microblogs exploit features such as the recency of the microblog, the authority of his/her author... to improve the quality of their results. In this paper, we evaluated some of the state-of-the-art features to determine those that discriminate relevant from irrelevant microblogs given an information need. Then, we used the selected features to learn models to determine their effectiveness in a microblog search task. We conducted a series of experiments using the dataset and topics of the TREC Microblog 2011 and 2012 tracks. Results show that content, hypertextuality, and recency are the best predictors of relevance. We also found that Naive Bayes was the most effective learning approach for this type of classification.
International Journal of Business Intelligence and Data Mining | 2010
Mouna Torjmen; Karen Pinel-Sauvagnat; Mohand Boughanem
We investigate in this paper the use of XML structure in multimedia retrieval, particularly in context-based image retrieval. We propose two methods to represent multimedia objects: the first one is based on an implicit use of textual and structural context of multimedia objects, whereas the second one is based on an explicit use of both sources. Experimental evaluation is carried out using the INEX MultimediaFragments Task 2006 and 2007. We show that there is a strong vocabulary relation between the query and the multimedia object representation, and that using XML structure improves significantly the effectiveness of multimedia retrieval.
acm ieee joint conference on digital libraries | 2011
Arlind Kopliku; Karen Pinel-Sauvagnat; Mohand Boughanem
In this paper we propose an attribute retrieval approach which extracts and ranks attributes from Web tables. We combine simple heuristics to filter out improbable attributes and we rank attributes based on frequencies and a table match score. Ranking is reinforced with external evidence from Web search, DBPedia and Wikipedia. Our approach can be applied to whatever instance (e.g. Canada) to retrieve its attributes (capital, GDP). It is shown it has a much higher recall than DBPedia and Wikipedia and that it works better than lexico-syntactic rules for the same purpose.
web intelligence | 2011
Arlind Kopliku; Firas Damak; Karen Pinel-Sauvagnat; Mohand Boughanem
Major search engines perform what is known as Aggregated Search (AS). They integrate results coming from different vertical search engines (images, videos, news, etc.) with typical Web search results. Aggregated search is relatively new and its advantages need to be evaluated. Some existing works have already tried to evaluate the interest (usefulness) of aggregated search as well as the effectiveness of the existing approaches. However, most of evaluation methodologies were based (i) on what we call relevance by intent (i.e. search results were not shown to real users), and (ii) short text queries. In this paper, we conducted a user study which was designed to revisit and compare the interest of aggregated search, by exploiting both relevance by intent and content, and using both short text and fixed need queries. This user study allowed us to analyze the distribution of relevant results across different verticals, and to show that AS helps to identify complementary relevant sources for the same information need. Comparison between relevance by intent and relevance by content showed that relevance by intent introduces a bias in evaluation. Discussion about the results also allowed us to identify some useful thoughts concerning the evaluation of AS approaches.
string processing and information retrieval | 2011
Arlind Kopliku; Karen Pinel-Sauvagnat; Mohand Boughanem
In this paper, we propose an attribute retrieval approach which extracts and ranks attributes from HTML tables. Given an instance (e.g. Tower of Pisa), we want to retrieve from the Web its attributes (e.g. height, architect). Our approach uses HTML tables which are probably the largest source for attribute retrieval. Three recall oriented filters are applied over tables to check the following three properties: (i) is the table relational, (ii) has the table a header, and (iii) the conformity of its attributes and values. Candidate attributes are extracted from tables and ranked with a combination of relevance features. Our approach can be applied to all instances and is shown to have a high recall and a reasonable precision. Moreover, it outperforms state of the art techniques.
conference on information and knowledge management | 2017
Thibaut Thonet; Guillaume Cabanac; Mohand Boughanem; Karen Pinel-Sauvagnat
Social media platforms such as weblogs and social networking sites provide Internet users with an unprecedented means to express their opinions and debate on a wide range of issues. Concurrently with their growing importance in public communication, social media platforms may foster echo chambers and filter bubbles: homophily and content personalization lead users to be increasingly exposed to conforming opinions. There is therefore a need for unbiased systems able to identify and provide access to varied viewpoints. To address this task, we propose in this paper a novel unsupervised topic model, the Social Network Viewpoint Discovery Model (SNVDM). Given a specific issue (e.g., U.S. policy) as well as the text and social interactions from the users discussing this issue on a social networking site, SNVDM jointly identifies the issues topics, the users viewpoints, and the discourse pertaining to the different topics and viewpoints. In order to overcome the potential sparsity of the social network (i.e., some users interact with only a few other users), we propose an extension to SNVDM based on the Generalized Pólya Urn sampling scheme (SNVDM-GPU) to leverage acquaintances of acquaintances relationships. We benchmark the different proposed models against three baselines, namely TAM, SN-LDA, and VODUM, on a viewpoint clustering task using two real-world datasets. We thereby provide evidence that our model SNVDM and its extension SNVDM-GPU significantly outperform state-of-the-art baselines, and we show that utilizing social interactions greatly improves viewpoint clustering performance.
acm symposium on applied computing | 2015
Rafik Abbes; Karen Pinel-Sauvagnat; Nathalie Hernandez; Mohand Boughanem
In this paper we aim at filtering documents containing timely relevant information about an entity (e.g., a person, a place, an organization) from a document stream. These documents that we call vital documents provide relevant and fresh information about the entity. The approach we propose leverages the temporal information reflected by the temporal expressions in the document in order to infer its vitality. Experiments carried out on the 2013 TREC Knowledge Base Acceleration (KBA) collection show the effectiveness of our approach compared to state-of-the-art approaches.
geographic information retrieval | 2013
Damien Palacio; Guillaume Cabanac; Gilles Hubert; Karen Pinel-Sauvagnat; Christian Sallaberry
We introduce a framework for searching places according to user interests and spatial context. Our framework combines existing geo-tools or services (e.g., Google Places, Yahoo! BOSS Geo Services, PostGIS, Gisgraphy, Geonames) and ranks results according to features such as distance, popularity, and user preferences. We used this framework to participate in the TREC 2013 Contextual Suggestion Track.
Ingénierie Des Systèmes D'information | 2013
Karen Pinel-Sauvagnat; Josiane Mothe
In this paper we review approaches for evaluating information retrieval systems us- ing test collections. We first give the definition of a test collection and present the main metrics used in literature to evaluate systems. We then show, thanks to three examples (search results clustering, automatic summarization and image retrieval), the variety of the existing evaluation frameworks. RESUME. Lobjectif de cet article est de presenter un panorama de levaluation des systemes de recherche dinformation se basant sur des collections de reference. Nous detaillons dans un premier temps ce quest une collection de reference ainsi que les mesures devaluation as- sociees. Nous developpons ensuite les problematiques devaluation a travers trois cadres de recherche dinformation specifiques: le clustering de documents, le resume automatique et la recherche dimages par lexemple, et montrons la variete des mesures et collections de reference existantes.