Jose G. Moreno
University of Caen Lower Normandy
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jose G. Moreno.
international acm sigir conference on research and development in information retrieval | 2014
Jose G. Moreno; Gaël Dias; Guillaume Cleuziou
Different important studies in Web search results clustering have recently shown increasing performances motivated by the use of external resources. Following this trend, we present a new algorithm called Dual C-Means, which provides a theoretical background for clustering in different representation spaces. Its originality relies on the fact that external resources can drive the clustering process as well as the labeling task in a single step. To validate our hypotheses, a series of experiments are conducted over different standard datasets and in particular over a new dataset built from the TREC Web Track 2012 to take into account query logs information. The comprehensive empirical evaluation of the proposed approach demonstrates its significant advantages over traditional clustering and labeling techniques.
north american chapter of the association for computational linguistics | 2016
Guillaume Cleuziou; Jose G. Moreno
This paper presents our participation to the SemEval “Task 13: Taxonomy Extraction Evaluation (TExEval-2)” (Bordea et al., 2016). This year, we propose the combination of recent semantic vectors representation into a methodology for semisupervised and auto-supervised acquisition of lexical taxonomies from raw texts. In our proposal, first similarities between concepts are calculated using semantic vectors, then a pretopological space is defined from which a preliminary structure is constructed. Finally, a genetic algorithm is used to optimize two different functions, the quality of the added relationships in the taxonomy and the quality of the structure. Experiments show that our proposal has a competitive performance when compared with the other participants achieving the second position in the general rank.
Pattern Recognition | 2015
Guillaume Cleuziou; Jose G. Moreno
This paper deals with the point symmetry-based clustering task that consists in retrieving - from a data set - clusters having a point symmetric shape. Prototype-based algorithms are considered and a non-trivial generalization to kernel methods is proposed, thanks to the geometric properties satisfied by the point symmetry distances proposed until now. The proposed kernelized framework offers new opportunities to deal with non-Euclidean symmetries and to reconsider any intractable examples by means of implicit feature spaces.A deep experimental study is proposed that brings out, on artificial data sets, the capabilities and the limits of the current point symmetry-based clustering methods. It reveals that kernel methods are quite capable of stretching the current limits for the considered task and encourages new research on the kernel selection issue in order to design a fully unsupervised symmetric pattern recognition process. HighlightsGeneralization (by kernelization) of a family of point symmetry distances.Kernelized-SBKM that offers new possibilities for point symmetry-based clustering.Empirical recognition of symmetric clusters using any proximity measure.Highlighting new simple examples, hard to manage by original methods.New complex examples well-managed with KSBKM by using implicit projections.
conference of the european chapter of the association for computational linguistics | 2014
Jose G. Moreno; Gaël Dias
This work discusses the evaluation of baseline algorithms for Web search results clustering. An analysis is performed over frequently used baseline algorithms and standard datasets. Our work shows that competitive results can be obtained by either fine tuning or performing cascade clustering over well-known algorithms. In particular, the latter strategy can lead to a scalable and real-world solution, which evidences comparative results to recent text-based state-of-the-art algorithms.
acm/ieee joint conference on digital libraries | 2014
Jose G. Moreno; Gaël Dias
Word Sense Induction is an open problem in Natural Language Processing. Many recent works have been addressing this problem with a wide spectrum of strategies based on content analysis. In this paper, we present a sense induction strategy exclusively based on link analysis over the Web. In particular, we explore the idea that the main different senses of a given word share similar linking properties and can be found by performing clustering with link-based similarity metrics. The evaluation results show that PageRank-based sense induction achieves interesting results when compared to state-of-the-art content-based algorithms in the context of Web Search Results Clustering.
international acm sigir conference on research and development in information retrieval | 2015
Jose G. Moreno; Gaël Dias
B-CUBED metrics have recently been adopted in the evaluation of clustering results as well as in many other related tasks. However, this family of metrics is not well adapted when datasets are unbalanced. This issue is extremely frequent in Web results, where classes are distributed following a strong unbalanced pattern. In this paper, we present a modified version of B-CUBED metrics to overcome this situation. Results in toy and real datasets indicate that the proposed adaptation correctly considers the particularities of unbalanced cases.
international conference on computational linguistics | 2014
Jose G. Moreno; Rumen Moraliyski; Asma Berrezoug; Gaël Dias
This paper describes the HULTECH team participation in Task 3 of SemEval-2014. Four different subtasks are provided to the participants, who are asked to determine the semantic similarity of cross-level test pairs: paragraphto- sentence, sentence-to-phrase, phrase-toword and word-to-sense. Our system adopts a unified strategy (general purpose system) to calculate similarity across all subtasks based on word Web frequencies. For that purpose, we define ClueWeb InfoSimba, a cross-level similarity corpus-based metric. Results show that our strategy overcomes the proposed baselines and achieves adequate to moderate results when compared to other systems.
meeting of the association for computational linguistics | 2013
Jose G. Moreno; Gaël Dias; Guillaume Cleuziou
string processing and information retrieval | 2012
Gaël Dias; Jose G. Moreno; Adam Jatowt; Ricardo Campos
10th NTCIR conference | 2013
Jose G. Moreno; Gaël Dias