Annie Morin
University of Rennes
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Annie Morin.
Ninth International Conference on Information Visualisation (IV'05) | 2005
Nicolas Bonnel; Alexandre Cotarmanac'h; Annie Morin
While searching the Web, the user is often confronted by a great number of results, generally sorted by their rank. These results are then displayed as a succession of ordered lists. Facing the limits of this approach, we propose a prototype to explore new organizations and presentations of search results, as well as new types of interactions with the results in order to make their exploration more intuitive and efficient. The main topic of this paper is the processing of the results coming from an information retrieval system. Although the relevance depends on the result quality, the effectiveness of the result processing represents an alternative way to improve the relevance for the user. Given the current expectations, this processing is composed by an organization step and a visualization step. Then the proposed prototype organizes the results according to their meaning using a Kohonen self-organizing map, and also visualizes them in a 3D scene to increase the representation space. The 3D metaphor proposed here is a city.
portuguese conference on artificial intelligence | 2007
Artur Šilić; Jean-Hugues Chauchat; Bojana Dalbelo Bašić; Annie Morin
In this paper we compare n-grams and morphological normalization, two inherently different text-preprocessing methods, used for text classification on a Croatian-English parallel corpus. Our approach to comparing different text preprocessing techniques is based on measuring computational performance (execution time and memory consumption), as well as classification performance. We show that although n-grams achieve classifier performance comparable to traditional word-based feature extraction and can act as a substitute for morphological normalization, they are computationally much more demanding.
information technology interfaces | 2004
Annie Morin
With the huge amount of available textual data, we need to find convenient ways to process the data and to get invaluable information. It appears that the use of factorial correspondence analysis allows to get most of the information included in the data. Besides, even after the data processing, we still have a big amount of material and we need visualization tools to display it. In this paper, we show how to use correspondence analysis in a sensible way and we give an application on the analysis of the internal scientific production of an important research center in France : the INRIA, the french national institute for research in computer science and control
artificial intelligence in medicine in europe | 2009
Lan Umek; Blaž Zupan; Marko Toplak; Annie Morin; Jean-Hugues Chauchat; Gregor Makovec; Dragica Smrke
Biomedical experimental data sets may often include many features both at input (description of cases, treatments, or experimental parameters) and output (outcome description). State-of-the-art data mining techniques can deal with such data, but would consider only one output feature at the time, disregarding any dependencies among them. In the paper, we propose the technique that can treat many output features simultaneously, aiming at finding subgroups of cases that are similar both in input and output space. The method is based on k -medoids clustering and analysis of contingency tables, and reports on case subgroups with significant dependency in input and output space. We have used this technique in explorative analysis of clinical data on femoral neck fractures. The subgroups discovered in our study were considered meaningful by the participating domain expert, and sparked a number of ideas for hypothesis to be further experimentally tested.
Journal of Classification | 2014
Mónica Bécue-Bertaut; Belchin Kostov; Annie Morin; Gulhem Naro
Rhetorical strategy is relevant in the law domain, where language is a vital instrument. Textual statistics have much to offer for uncovering such a strategy. We propose a methodology that starts from a non-structured text; first, the breakpoints are automatically detected and lexically homogeneous parts are identified; then, the shape of the text through the trajectory of these parts and their hierarchical structure are uncovered; finally, the argument flow is tracked along. Several methods are combined. Chronological clustering of multidimensional count series detects the breakpoints; the shape of the text is revealed by applying correspondence analysis to the parts×words table while the progression of the argument is described by labelled time-constrained hierarchical clustering. This methodology is illustrated on a rhetoric forensic application, concretely a closing speech delivered by a prosecutor at Barcelona Criminal Court. This approach could also be useful in politics, communication and professional writing.
intelligent data analysis | 2009
Sasa Petrovic; Bojana Dalbelo Bašić; Annie Morin; Blaž Zupan; Jean-Hugues Chauchat
Explorative data analysis in text mining essentially relies on effective visualization techniques which can expose hidden relationships among documents and reveal correspondence between documents and their features. In text mining, the documents are most often represented by feature vectors of very high dimensions, requiring dimensionality reduction to obtain visual projections in two- or three-dimensional space. Correspondence analysis is an unsupervised approach that allows for construction of low-dimensional projection space with simultaneous placement of both documents and features, making it ideal for explorative analysis in text mining. Its present use, however, has been limited to word-based features. In this paper, we investigate how this particular document representation compares to the representation with letter n-grams and word n-grams, and find that these alternative representations yield better results in separating documents of different class. We perform our experimental analysis on a bilingual Croatian-English parallel corpus, allowing us to additionally explore the impact of features in different languages on the quality of visualizations.
Expert Systems With Applications | 2012
Arthur Šilić; Annie Morin; Jean-Hugues Chauchat; Bojana Dalbelo Bašić
In this paper, we present CatViz-Temporally-Sliced Correspondence Analysis Visualization. This novel method visualizes relationships through time and is suitable for large-scale temporal multivariate data. We couple CatViz with clustering methods, whereupon we introduce the concept of final centroid transfer, which enables the correspondence of clusters in time. Although CatViz can be used on any type of temporal data, we show how it can be applied to the task of exploratory visual analysis of text collections. We present a successful concept of employing feature-type filtering to present different aspects of textual data. We performed case studies on large collections of French and English news articles. In addition, we conducted a user study that confirms the usefulness of our method. We present typical tasks of exploratory text analysis and discuss application procedures that an analyst might perform. We believe that CatViz is general and highly applicable to large data sets because of its intuitiveness, effectiveness, and robustness. We expect that it will enable a better understanding of texts in huge historical archives.
EGC (best of volume) | 2010
Nguyen-Khang Pham; Annie Morin; Patrick Gros; Quyet-Thang Le
In this paper, we investigate the intensive use of Correspondence Analysis (CA) for large scale content-based image retrieval. Correspondence Analysis is a useful method for analyzing textual data and we adapt it to images using the SIFT local descriptors. CA is used to reduce dimensions and to limit the number of images to be considered during the search step. An incremental algorithm for CA is proposed to deal with large databases giving exactly the same result as the standard algorithm. We also integrate the Contextual Dissimilarity Measure in our search scheme in order to improve response time and accuracy. We explore this integration in two ways: (i) off-line (the structure of image neighborhoods is corrected off-line) and (ii) on-the-fly (the structure of image neighborhoods is adapted during the search). The evaluation tests have been performed on a large image database (up to 1 million images).
computer analysis of images and patterns | 2009
Nguyen-Khang Pham; Annie Morin; Patrick Gros
We are interested in the intensive use of Factorial Correspondence Analysis (FCA) for large-scale content-based image retrieval. Factorial Correspondence Analysis, is a useful method for analyzing textual data, and we adapt it to images using the SIFT local descriptors. FCA is used to reduce dimensions and to limit the number of images to be considered during the search. Graphics Processing Units (GPU) are fast emerging as inexpensive parallel processors due to their high computation power and low price. The G80 family of Nvidia GPUs provides the CUDA programming model that treats the GPU as a SIMD processor array. We present two very fast algorithms on GPU for image retrieval using FCA: the first one is a parallel incremental algorithm for FCA and the second one is an extension of the filtering algorithm in our previous work for filtering step. Our implementation is able to scale up the FCA computation a factor of 30 compared to the CPU version. For retrieval tasks, the parallel version on GPU performs 10 times faster than the one on CPU. Retrieving images in a database of 1 million images is done in about 8 milliseconds.
international conference on multimedia and expo | 2004
Anicet Kouomou-Choupo; Laure Berti-Equille; Annie Morin
The administration of very large collections of images, accentuates the classical problems of indexing and efficiently querying information. This paper describes a new method applied to very large still image databases that combines two data mining techniques: clustering and association rules mining in order to better organize image collections and to improve the performance of queries. The objective of our work is to exploit association rules discovered by mining, global MPEG-7 features data and to adapt the query processing. In our experiment, we use five MPEG-7 features to describe several thousands of still images. For each feature, we initially determine several clusters of images by using a K-mean algorithm. Then, we generate association rules between different clusters of features and exploit these rules to rewrite the query and to optimize the query-by-content processing