Fidelia Ibekwe-SanJuan
University of Lyon
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fidelia Ibekwe-SanJuan.
Journal of the Association for Information Science and Technology | 2010
Chaomei Chen; Fidelia Ibekwe-SanJuan; Jianhua Hou
A multiple-perspective co-citation analysis method is introduced for characterizing and interpreting the structure and dynamics of co-citation clusters. The method facilitates analytic and sense making tasks by integrating network visualization, spectral clustering, automatic cluster labeling, and text summarization. Co-citation networks are decomposed into co-citation clusters. The interpretation of these clusters is augmented by automatic cluster labeling and summarization. The method focuses on the interrelations between a co-citation clusters members and their citers. The generic method is applied to a three-part analysis of the field of Information Science as defined by 12 journals published between 1996 and 2008: 1) a comparative author co-citation analysis (ACA), 2) a progressive ACA of a time series of co-citation networks, and 3) a progressive document co-citation analysis (DCA). Results show that the multiple- perspective method increases the interpretability and accountability of both ACA and DCA networks.
visual analytics science and technology | 2006
Chaomei Chen; Fidelia Ibekwe-SanJuan; Eric SanJuan; Chris Weaver
Understanding the nature and dynamics of conflicting opinions is a profound and challenging issue. In this paper we address several aspects of the issue through a study of more than 3,000 Amazon customer reviews of the controversial bestseller The Da Vinci Code, including 1,738 positive and 918 negative reviews. The study is motivated by critical questions such as: what are the differences between positive and negative reviews? What is the origin of a particular opinion? How do these opinions change over time? To what extent can differentiating features be identified from unstructured text? How accurately can these features predict the category of a review? We first analyze terminology variations in these reviews in terms of syntactic, semantic, and statistic associations identified by TermWatch and use term variation patterns to depict underlying topics. We then select the most predictive terms based on log likelihood tests and demonstrate that this small set of terms classifies over 70% of the conflicting reviews correctly. This feature selection process reduces the dimensionality of the feature space from more than 20,000 dimensions to a couple of hundreds. We utilize automatically generated decision trees to facilitate the understanding of conflicting opinions in terms of these highly predictive terms. This study also uses a number of visualization and modeling tools to identify not only what positive and negative reviews have in common, but also they differ and evolve over time
Information Processing and Management | 2006
Eric SanJuan; Fidelia Ibekwe-SanJuan
We consider a challenging clustering task: the clustering of multi-word terms without document co-occurrence information in order to form coherent groups of topics. For this task, we developed a methodology taking as input multiword terms and lexico-syntactic relations between them. Our clustering algorithm, named CPCL is implemented in the Term-Watch system. We compared CPCL to other existing clustering algorithms, namely hierarchical and partitioning (k-means, k-medoids). This out-of-context clustering task led us to adapt multi-word term representation for statistical methods and also to refine an existing cluster evaluation metric, the editing distance in order to evaluate the methods. Evaluation was carried out on a list of multi-word terms from the genomic field which comes with a hand built taxonomy. Results showed that while k-means and k-medoids obtained good scores on the editing distance, they were very sensitive to term length. CPCL on the other hand obtained a better cluster homogeneity score and was less sensitive to term length. Also, CPCL showed good adaptability for handling very large and sparse matrices.
meeting of the association for computational linguistics | 1998
Fidelia Ibekwe-SanJuan
After extracting terms from a corpus of titles and abstracts in English, syntactic variation relations are identified amongst them in order to detect research topics. Three types of syntactic variations were studied: permutation, expansion and substitution. These syntactic variations yield other relations of formal and conceptual nature. Basing on a distinction of the variation relations according to the grammatical function affected in a term - head or modifier - term variants are first clustered into connected components which are in turn clustered into classes. These classes relate two or more components through variations involving a change of head word, thus of topic. The graph obtained reveals the global organisation of research topics in the corpus. A clustering method has been built to compute such classes of research topics.
Computer Speech & Language | 2005
Eric SanJuan; James Dowdall; Fidelia Ibekwe-SanJuan; Fabio Rinaldi
This paper presents a three-level structuring of multiword terms basing on lexical inclusion, WordNet similarity and a clustering approach. Term clustering by automatic data analysis methods offers an interesting way of organizing a domains knowledge structure, useful for several information-oriented tasks like science and technology watch, textmining, computer-assisted ontology population, Question Answering (Q-A). This paper explores how this three-level term structuring brings to light the knowledge structures from a corpus of genomics and compares the mapping of the domain topics against a hand-built ontology (the GENIA ontology). Ways of integrating the results into a Q-A system are discussed.
meeting of the association for computational linguistics | 2003
James Dowdall; Fabio Rinaldi; Fidelia Ibekwe-SanJuan; Eric SanJuan
Question Answering provides a method of locating precise answers to specific questions but in technical domains the amount of Multi-Word Terms complicates this task.This paper outlines the Question Answering task in such a domain and explores two ways of detecting relations between Multi-Word Terms. The first targets specific semantic relations, the second uses a clustering algorithm, but they are both based on the idea of syntactic variation. The paper demonstrates how the combination of these two methodologies provide sophisticated access to technical domains.
Journal of the Association for Information Science and Technology | 2012
Fidelia Ibekwe-SanJuan
The French conception of information science is often contrasted with the Anglophone one, which is perceived as different and rooted mainly in Shannons mathematical theory of communication. While there is such a thing as a French conception of information science, this conception is not totally divorced from the Anglophone one. Unbeknownst to researchers from the two geographical and cultural regions, they share similar conceptions of the field and invoke similar theoretical foundations, in particular the socio-constructivist theory. There is also a convergence of viewpoints on the dual nature of information science, i.e., the fact that it is torn between two competing paradigms—objectivist and subjectivist. Technology is another area where a convergence of viewpoints is noticeable: Scholars from both geographic and cultural zones display the same suspicion toward the role of technology and of computer science. It would therefore be misleading to uphold the view that Anglophone information science is essentially objectivist and technicist while the French conception is essentially social and rooted in the humanities. This paper highlights converging analyses from authors based in both linguistic and geographical regions with the aim to foster a better understanding of the challenges that information science is facing worldwide and to help trace a path to how the global information science community can try to meet them.
Proceedings of the American Society for Information Science and Technology | 2014
Sachi Arafat; Michael K. Buckland; Melanie Feinberg; Fidelia Ibekwe-SanJuan; Ryan Shaw; Julian Warner
The field of LIS is beset by recurrent debates as to its disciplinary status. For decades, the interdisciplinary nature of information science has been upheld without much proof from the ground. But if LIS is not an interdiscipline, is it then a meta-, a trans- a pluri-, a multi- or simply a discipline? The different proposals for qualifying the nature of LIS or for delineating its frontiers suggest that its fundamental nature remains unclear for its community. But is LIS alone in this dilemma and does it really matter? Does it stop the field from progressing?
Advances in Focused Retrieval | 2009
Fidelia Ibekwe-SanJuan; Eric SanJuan
This paper reports our participation in the INEX 2008 Ad-Hoc Retrieval track. We investigated the effect of multiword terms on retrieval effectiveness in an interactive query expansion (IQE) framework. The IQE approach is compared to a state-of-the-art IR engine (in this case Indri) implementing a bag-of-word query and document representation, coupled with pseudo-relevance feedback (automatic query expansion(AQE)). The performance of multiword query and document representation was enhanced when the term structure was relaxed to accept the insertion of additional words while preserving the original structure and word order. The search strategies built with multiword terms coupled with QE obtained very competitive scores in the three Ad-Hoc tasks: Focused retrieval, Relevant-in-Context and Best-in-Context.
applications of natural language to data bases | 2007
Eric SanJuan; Fidelia Ibekwe-SanJuan; Juan-Manuel Torres-Moreno; Patricia Velázquez-Morales
In this paper, we target document ranking in a highly technical field with the aim to approximate a ranking that is obtained through an existing ontology (knowledge structure). We test and combine symbolic and vector space models (VSM). Our symbolic approach relies on shallow NLP and on internal linguistic relations between Multi-Word Terms (MWTs). Documents are ranked based on different semantic relations they share with the query terms, either directly or indirectly after clustering the MWTs using the identified lexico-semantic relations. The VSM approach consisted in ranking documents with different functions ranging from the classical tf.idf to more elaborate similarity functions. Results shows that the ranking obtained by the symbolic approach performs better on most queries than the vector space model. However, the ranking obtained by combining both approaches outperforms by a wide margin the results obtained by methods from each approach.