Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Soto Montalvo is active.

Publication


Featured researches published by Soto Montalvo.


meeting of the association for computational linguistics | 2006

Multilingual Document Clustering: An Heuristic Approach Based on Cognate Named Entities

Soto Montalvo; Raquel Martínez; Arantza Casillas; Víctor Fresno

This paper presents an approach for Multilingual Document Clustering in comparable corpora. The algorithm is of heuristic nature and it uses as unique evidence for clustering the identification of cognate named entities between both sides of the comparable corpora. One of the main advantages of this approach is that it does not depend on bilingual or multilingual resources. However, it depends on the possibility of identifying cognate named entities between the languages used in the corpus. An additional advantage of the approach is that it does not need any information about the right number of clusters; the algorithm calculates it. We have tested this approach with a comparable corpus of news written in English and Spanish. In addition, we have compared the results with a system which translates selected document features. The obtained results are encouraging.


Pattern Recognition Letters | 2007

Multilingual news clustering: Feature translation vs. identification of cognate named entities

Soto Montalvo; Raquel Martínez; Arantza Casillas; Víctor Fresno

In this paper we evaluate the influence of different document representations in the results of multilingual news clustering. We aim at proving whether or not the use of only named entities is a good source of knowledge for multilingual news clustering. We compare two approaches: one based on feature translation, and another based on cognate identification. Our main contribution is using only some categories of cognate named entities like document representation features to perform multilingual news clustering, without the need of translation resources. The results show that the use of cognate named entities, as the only type of features to represent news, leads to good multilingual clustering performance, comparable to the one obtained by using the feature translation approach.


text speech and dialogue | 2007

Bilingual news clustering using named entities and fuzzy similarity

Soto Montalvo; Raquel Martínez; Arantza Casillas; Víctor Fresno

This paper is focused on discovering bilingual news clusters in a comparable corpus. Particularly, we deal with the news representation and with the calculation of the similarity between documents. We use as representative features of the news the cognate named entities they contain. One of our main goals consists of proving whether the use of only named entities is a good source of knowledge for multilingual news clustering. In the vectorial news representation we take into account the category of the named entities. In order to determine the similarity between two documents, we propose a new approach based on a fuzzy system, with a knowledge base that tries to incorporate the human knowledge about the importance of the named entities category in the news. We have compared our approach with a traditional one obtaining better results in a comparable corpus with news in Spanish and English.


association for information science and technology | 2017

Person Name Disambiguation in the Web Using Adaptive Threshold Clustering

Agustín D. Delgado; Raquel Martínez; Soto Montalvo; Víctor Fresno

In this article, we present a new clustering algorithm for Person Name Disambiguation in web search results. The algorithm groups web results according to the individuals they refer to. The best state‐of‐the‐art approaches require training data in order to learn thresholds for deciding when to group the webpages. However, the ambiguity level of person names on the web could not be previously estimated and the results of those methods strongly depend on the thresholds obtained with the training collections. We present the concept of adaptive threshold, which avoids the need of a previous supervised learning process, depending only on the content of the compared documents to decide if they refer to the same person. We evaluated our approach using three datasets reaching close results to those obtained by the most successful algorithms in the state‐of‐the‐art that require such a learning process, and outperforming the results of those obtained by algorithms that do not require it.


international conference on digital information management | 2007

Improving Web Page Clustering Through Selecting Appropiate Term Weighting Functions

Víctor Fresno; Raquel Martínez; Soto Montalvo

Web page clustering is useful for taxonomy design, information extraction, similarity search, and it can assist to the evaluation and visualization of the results of search engines. Therefore, an accurate clustering is a goal in Web mining and Web information extraction. Besides the particular clustering algorithm, the different term weighting functions applied to the selected features to represent Web pages is a main aspect in clustering task. This paper presents the evaluation of the performance of six different term weighting functions of Web pages, by means of a partitioning clustering algorithm results. Besides, two reduction methods have been applied: (1) the proper function, and (2) removing all features occurring more times than upper thresholds in page and collection, and occurring less times than lower thresholds in page and collection. By means of the experimentation with a collection of Web documents used in clustering research, we have determined that the best results are obtained when the term weighting function based on a fuzzy criteria combination is used.


text speech and dialogue | 2006

Multilingual news document clustering: two algorithms based on cognate named entities

Soto Montalvo; Raquel Martínez; Arantza Casillas; Víctor Fresno

This paper presents an approach for Multilingual News Document Clustering in comparable corpora We have implemented two algorithms of heuristic nature that follow the approach They use as unique evidence for clustering the identification of cognate named entities between both sides of the comparable corpora In addition, no information about the right number of clusters has to be provided to the algorithms The applicability of the approach only depends on the possibility of identifying cognate named entities between the languages involved in the corpus The main difference between the two algorithms consists of whether a monolingual clustering phase is applied at first or not We have tested both algorithms with a comparable corpus of news written in English and Spanish The performance of both algorithms is slightly different; the one that does not apply the monolingual phase reaches better results In any case, the obtained results with both algorithms are encouraging and show that the use of cognate named entities can be enough knowledge for deal with multilingual clustering of news documents.


ieee international conference on fuzzy systems | 2012

Automatic cognate identification based on a fuzzy combination of string similarity measures

Soto Montalvo; Eduardo G. Pardo; Raquel Martínez; Víctor Fresno

Cognates are words in different languages that have similar spelling and meaning. The identification of cognates is very useful for many different Natural Language Processing tasks, and also in the process of learning a second language. This paper presents a new approach to classify pairs of words into cognates/false friends or not related classes. The proposed approach uses a fuzzy system to combine complementary string similarity measures in order to improve the cognate identification task. The underlying hypothesis is that the combination of different string measures by applying heuristic knowledge, can outperform those measures working separately. The results obtained by the proposed system confirm the previous hypothesis, and furthermore it also outperforms other systems that combine string measures by using a supervised approach. As an additional contribution, we have created a bilingual test data set which include pairs of cognates, false friends and unrelated words in Spanish and English, that is freely available for research purposes.


language resources and evaluation | 2017

MC4WEPS: a multilingual corpus for Web people search disambiguation

Soto Montalvo; Raquel Martínez; Leonardo Campillos; Agustín D. Delgado; Víctor Fresno; Felisa Verdejo

This article introduces the MC4WEPS corpus, a new resource for evaluating Web people search disambiguation tasks, and describes its design, collection and annotation process, the agreement between the different annotators, and finally introduces a baseline evaluation. This corpus is built by compiling multilingual search engines results where the queries are person names. Proper noun disambiguation is an open problem in natural language ambiguity resolution and, specifically, resolving the ambiguity of person names in Web search results is still a challenging problem. However, state-of-the-art approaches have been evaluated only with monolingual web page collections. The MC4WEPS corpus aims to provide the research community with a reference corpus for the task of disambiguating search engine results where the query is a person name shared by homonymous individuals. The features of this new corpus stand out from existing corpora for the same task, namely multilingualism and inclusion of social networking websites. These characteristics make it more representative of a real search scenario, especially for evaluating person name disambiguation in a multilingual context. The article also includes detailed information about the format and the availability of the corpus.


Journal of the Association for Information Science and Technology | 2015

Exploiting named entities for bilingual news clustering

Soto Montalvo; Raquel Martínez; Víctor Fresno; Agustín D. Delgado

In this article, we present a new algorithm for clustering a bilingual collection of comparable news items in groups of specific topics. Our hypothesis is that named entities (NEs) are more informative than other features in the news when clustering fine grained topics. The algorithm does not need as input any information related to the number of clusters, and carries out the clustering only based on information regarding the shared named entities of the news items. This proposal is evaluated using different data sets and outperforms other state‐of‐the‐art algorithms, thereby proving the plausibility of the approach. In addition, because the applicability of our approach depends on the possibility of identifying equivalent named entities among the news, we propose a heuristic system to identify equivalent named entities in the same and different languages, thereby obtaining good performance.


IEEE Computer | 2015

Multilingual Information Access on the Web

Soto Montalvo; Raquel Martínez; Víctor Fresno; Rafael Capilla

Named entities (NEs) can facilitate access to multilingual knowledge sources--which have exploded in recent years--but the identification, classification, and retrieval of NEs remain challenging tasks.

Collaboration


Dive into the Soto Montalvo's collaboration.

Top Co-Authors

Avatar

Víctor Fresno

National University of Distance Education

View shared research outputs
Top Co-Authors

Avatar

Raquel Martínez

National University of Distance Education

View shared research outputs
Top Co-Authors

Avatar

Agustín D. Delgado

National University of Distance Education

View shared research outputs
Top Co-Authors

Avatar

Arantza Casillas

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Raquel Martínez-Unanue

National University of Distance Education

View shared research outputs
Top Co-Authors

Avatar

Abraham Duarte

King Juan Carlos University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eduardo G. Pardo

King Juan Carlos University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge