Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sofía N. Galicia-Haro is active.

Publication


Featured researches published by Sofía N. Galicia-Haro.


text speech and dialogue | 2005

Detection and correction of malapropisms in spanish by means of internet search

Igor A. Bolshakov; Sofía N. Galicia-Haro; Alexander F. Gelbukh

Malapropisms are real-word errors that lead to syntactically correct but semantically implausible text. We report an experiment on detection and correction of Spanish malapropisms. Malapropos words semantically destroy collocations (syntactically connected word pairs) they are in. Thus we detect possible malapropisms as words that do not form semantically plausible collocations with neighboring words. As correction candidates, we select words similar to the suspected one but forming plausible collocations with neighboring words. To judge semantic plausibility of a collocation, we use Google statistics of occurrences of the word combination and of the two words taken apart. Since collocation components can be separated by other words in a sentence, Google statistics is gathered for the most probable distance between them. The statistics is recalculated to a specially defined Semantic Compatibility Index (SCI). Heuristic rules are proposed to signal malapropisms when SCI values are lower than a predetermined threshold and to retain a few highly SCI-ranked correction candidates. Our experiments gave promising results.


text speech and dialogue | 2003

Stable Coordinated Pairs in Text Processing

Igor A. Bolshakov; Alexander F. Gelbukh; Sofía N. Galicia-Haro

Stable coordinated pairs (SCPs), e.g., comments and suggestions, far and near, sooner or later occur rather frequently in various European languages, whereas there is only a few thousands of them. We argue that a dictionary of SCPs of the given language supplied with their characteristics of lexical, morphological, syntactic, semantic, and pragmatic nature can help in such important natural language processing tasks and applications as word sense disambiguation, parsing, as well as detection and correction of semantic errors.


mexican international conference on artificial intelligence | 2004

Recognition of Named Entities in Spanish Texts

Sofía N. Galicia-Haro; Alexander F. Gelbukh; Igor A. Bolshakov

Proper name recognition is a subtask of Name Entity Recognition in Message Understanding Conference. For our corpus annotation proper name recognition is a crucial task since proper names appear approximately in more than 50% of total sentences of the electronic texts that we collected for such purpose. Our work is focused on composite proper names (names with coordinated constituents, names with several prepositional phrases, and names of songs, books, movies, etc.) We describe a method based on heterogeneous knowledge and simple resources, and the preliminary obtained results.


international conference on computational linguistics | 2003

Can we correctly estimate the total number of pages in Google for a specific language

Igor A. Bolshakov; Sofía N. Galicia-Haro

It is argued that for some applications the total amount of web-pages actually stored in an Internet search engine for a specific language is relevant. It is shown that some elementary steps in getting statistics characterizing Google engines database are bewildering: simple set theory operations gives evidently inconsistent results. Without claiming an ultimate precision, we propose a method of estimation of the total page amount for a given language in a given moment. It takes amounts of Google pages for the words most frequent in a representative text corpus, reorders these words, and gives maximum likelihood estimates for their contributions. The method is applied to Spanish and gives the results with theoretically calculated precision much higher than really needed while resting on such an error-prone mechanism outputting raw statistical data.


text speech and dialogue | 1999

A Simple Spanish Part of Speech Tagger for Detection and Correction of Accentuation Error

Sofía N. Galicia-Haro; Igor A. Bolshakov; Alexander F. Gelbukh

One of the most frequent kind of typographic errors specific to Spanish is connected with accentuation, namely, with omission of an obligatory stress mark or insertion of a superfluous one. If such an error transforms one word to another existing one, the latter cannot be detected by usual spell-checkers, since some context analysis is necessary. A simple procedure is proposed for this task. It relies on (1) some simple heuristics that determine linear context and (2) on a small list of pairs of words that differ only in accentuation mark. This idea is applied to numerous nouns or adjectives like numero that pass to quasi-homonymous personal verb forms if they lose their stress marks.


mexican international conference on artificial intelligence | 2014

Extraction of Semantic Relations from Opinion Reviews in Spanish

Sofía N. Galicia-Haro; Alexander F. Gelbukh

We report research on semantic relations extraction to build taxonomies. The state of the art approaches are based on text corpus or on domain texts acquisition to accurately characterize the domain of interest. We analyzed the application of unsupervised methods for ontology building using a collection of opinion reviews in Spanish and the Web. We present some results and discuss the obtained relations.


international conference natural language processing | 2005

Web-assisted detection and correction of joint and disjoint malapropos word combinations

Igor A. Bolshakov; Sofía N. Galicia-Haro

An experiment on Web-assisted detection and correction of malapropism is reported. Malapropos words semantically destroy collocations they are in, usually with retention of syntactical links with other words. A hundred English malapropisms were gathered, each supplied with its correction candidates, i.e. word combinations with one word equal to an editing variant of the corresponding word in the malapropism. Google statistics of occurrences and co-occurrences were gathered for each malapropism and correcting candidate. The collocation components may be adjacent or separated by other words in a sentence, so statistics were accumulated for the most probable distance between them. The raw Google occurrence statistics are then recalculated to numeric values of a specially defined Semantic Compatibility Index (SCI). Heuristic rules are proposed to signal malapropisms when SCI values are lower than a predetermined threshold and to retain a few highly SCI-ranked correction candidates. Within certain limitations, the experiment gave promising results.


systems man and cybernetics | 2001

Acquiring syntactic information for a government pattern dictionary from large text corpora

Sofía N. Galicia-Haro; Alexander F. Gelbukh; Igor A. Bolshakov

There are some research lines in automatic subcategorization frame acquisition and the importance of their work could not be doubted. However, almost all automatic work has been done in the constituent approach. Conversely, manual work is the traditional way for syntactic information acquisition in the dependency approach, which considers the correspondence between semantic valences and theirs syntactic realizations. The last approximation has some advantages for description of languages with relaxed word order constraints and a vast prepositional use. Our work is intended to compile automatically a government patterns dictionary in what syntactic information is referred to and to give a tool to facilitate linking of valences and meaning.


mexican international conference on artificial intelligence | 2009

Supervised Recognition of Age-Related Spanish Temporal Phrases

Sofía N. Galicia-Haro; Alexander F. Gelbukh

This paper reports research on temporal expressions shaped by a common temporal expression for a period of years modified by an adverb of time. From a Spanish corpus we found that some of those phrases are age-related expressions. To determine automatically the temporal phrases with such meaning we analyzed a bigger sample obtained from the Internet. We analyzed these examples to define the relevant features to support a learning method. We present some preliminary results when a decision tree is applied.


Archive | 2015

Advances in Artificial Intelligence and Soft Computing

Grigori Sidorov; Sofía N. Galicia-Haro

In this paper a sign-based or semiotic formalism is considered. The concept of sign arose in the framework of semiotics. Neurophysiological and psychological researches indicate sign-based structures, which are the basic elements of the world model of a human subject. These elements are formed during his/her activity and communication. In this formalism it was possible to formulate and solve the problem of goal-setting, i.e. generating the goal of behavior.

Collaboration


Dive into the Sofía N. Galicia-Haro's collaboration.

Top Co-Authors

Avatar

Alexander F. Gelbukh

Instituto Politécnico Nacional

View shared research outputs
Top Co-Authors

Avatar

Igor A. Bolshakov

Instituto Politécnico Nacional

View shared research outputs
Top Co-Authors

Avatar

Félix Castro Espinoza

Universidad Autónoma del Estado de Hidalgo

View shared research outputs
Top Co-Authors

Avatar

Grigori Sidorov

Instituto Politécnico Nacional

View shared research outputs
Top Co-Authors

Avatar

Alejandro Peña

Instituto Politécnico Nacional

View shared research outputs
Top Co-Authors

Avatar

Alonso Palomino-Garibay

National Autonomous University of Mexico

View shared research outputs
Top Co-Authors

Avatar

Arturo Hernández-Aguirre

Centro de Investigación en Matemáticas

View shared research outputs
Top Co-Authors

Avatar

Carlos A. Reyes-García

National Institute of Astrophysics

View shared research outputs
Top Co-Authors

Avatar

Efrén Mezura-Montes

Instituto Politécnico Nacional

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge