Is this you? Create Your Porfile

J. Fco. Martínez-Trinidad

National Institute of Astrophysics, Optics and Electronics

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where J. Fco. Martínez-Trinidad is active.

Explore More

Publication

Featured researches published by J. Fco. Martínez-Trinidad.

Expert Systems With Applications | 2011

General framework for class-specific feature selection

Bárbara B. Pineda-Bautista; Jesús Ariel Carrasco-Ochoa; J. Fco. Martínez-Trinidad

Commonly, when a feature selection algorithm is applied, a single feature subset is selected for all the classes, but this subset could be inadequate for some classes. Class-specific feature selection allows selecting a possible different feature subset for each class. However, all the class-specific feature selection algorithms have been proposed for a particular classifier, which reduce their applicability. In this paper, a general framework for using any traditional feature selector for doing class-specific feature selection, which allows using any classifier, is proposed. Experimental results and a comparison against traditional feature selectors showing the suitability of the proposed framework are included.

international conference natural language processing | 2006

Document clustering based on maximal frequent sequences

Edith Hernández-Reyes; René Arnulfo García-Hernández; Jesús Ariel Carrasco-Ochoa; J. Fco. Martínez-Trinidad

Document clustering has the goal of discovering groups with similar documents. The success of the document clustering algorithms depends on the model used for representing these documents. Documents are commonly represented with the vector space model based on words or n-grams. However, these representations have some disadvantages such as high dimensionality and loss of the word sequential order. In this work, we propose a new document representation in which the maximal frequent sequences of words are used as features of the vector space model. The proposed model efficiency is evaluated by clustering different document collections and compared against the vector space model based on words and n-grams, through internal and external measures.

iberoamerican congress on pattern recognition | 2008

Prototype Selection Via Prototype Relevance

J. Arturo Olvera-López; J. Ariel Carrasco-Ochoa; J. Fco. Martínez-Trinidad

In Pattern recognition, the supervised classifiers use a training set Tfor classifying new prototypes. In practice, not all information in Tis useful for classification therefore it is necessary to discard irrelevant prototypes from T. This process is known as prototype selection, which is an important task for classifiers since through this process the time in the training and/or classification stages could be reduced. Several prototype selection methods have been proposed following the Nearest Neighbor (NN) rule; in this work, we propose a new prototype selection method based on the prototype relevance and border prototypes, which is faster (over large datasets) than the other tested prototype selection methods. We report experimental results showing the effectiveness of our method and compare accuracy and runtimes against other prototype selection methods.

Pattern Recognition | 2013

InstanceRank based on borders for instance selection

Pablo Hernandez-Leal; J. Ariel Carrasco-Ochoa; J. Fco. Martínez-Trinidad; J. Arturo Olvera-López

Instance selection algorithms are used for reducing the number of training instances. However, most of them suffer from long runtimes which results in the incapability to be used with large datasets. In this work, we introduce an Instance Ranking per class using Borders (instances near to instances belonging to different classes), using this ranking we propose an instance selection algorithm (IRB). We evaluated the proposed algorithm using k-NN with small and large datasets, comparing it against state of the art instance selection algorithms. In our experiments, for large datasets IRB has the best compromise between time and accuracy. We also tested our algorithm using SVM, LWLR and C4.5 classifiers, in all cases the selection computed by our algorithm obtained the best accuracies in average.

Pattern Recognition | 2010

Fast k most similar neighbor classifier for mixed data (tree k-MSN)

Selene Hernández-Rodríguez; J. Fco. Martínez-Trinidad; J. Ariel Carrasco-Ochoa

The k nearest neighbor (k-NN) classifier has been a widely used nonparametric technique in Pattern Recognition, because of its simplicity and good performance. In order to decide the class of a new prototype, the k-NN classifier performs an exhaustive comparison between the prototype to classify and the prototypes in the training set T. However, when T is large, the exhaustive comparison is expensive. For this reason, many fast k-NN classifiers have been developed, some of them are based on a tree structure, which is created during a preprocessing phase using the prototypes in T. Then, in a search phase, the tree is traversed to find the nearest neighbor. The speed up is obtained, while the exploration of some parts of the tree is avoided using pruning rules which are usually based on the triangle inequality. However, in soft sciences as Medicine, Geology, Sociology, etc., the prototypes are usually described by numerical and categorical attributes (mixed data), and sometimes the comparison function for computing the similarity between prototypes does not satisfy metric properties. Therefore, in this work an approximate fast k most similar neighbor classifier, for mixed data and similarity functions that do not satisfy metric properties, based on a tree structure (Tree k-MSN) is proposed. Some experiments with synthetic and real data are presented.

intelligent data analysis | 2012

Building fast decision trees from large training sets

Anilu Franco-Arcega; Jesús Ariel Carrasco-Ochoa; Guillermo Sánchez-Díaz; J. Fco. Martínez-Trinidad

Decision trees are commonly used in supervised classification. Currently, supervised classification problems with large training sets are very common, however many supervised classifiers cannot handle this amount of data. There are some decision tree induction algorithms that are capable to process large training sets, however almost all of them have memory restrictions because they need to keep in main memory the whole training set, or a big amount of it. Moreover, algorithms that do not have memory restrictions have to choose a subset of the training set, needing extra time for this selection; or they require to specify the values for some parameters that could be very difficult to determine by the user. In this paper, we present a new fast heuristic for building decision trees from large training sets, which overcomes some of the restrictions of the state of the art algorithms, using all the instances of the training set without storing all of them in main memory. Experimental results show that our algorithm is faster than the most recent algorithms for building decision trees from large training sets.

Expert Systems With Applications | 2011

Decision tree induction using a fast splitting attribute selection for large datasets

Anilu Franco-Arcega; Jesús Ariel Carrasco-Ochoa; Guillermo Sánchez-Díaz; J. Fco. Martínez-Trinidad

Abstract Several algorithms have been proposed in the literature for building decision trees (DT) for large datasets, however almost all of them have memory restrictions because they need to keep in main memory the whole training set, or a big amount of it, and such algorithms that do not have memory restrictions, because they choose a subset of the training set, need extra time for doing this selection or have parameters that could be very difficult to determine. In this paper, we introduce a new algorithm that builds decision trees using a fast splitting attribute selection (DTFS) for large datasets. The proposed algorithm builds a DT without storing the whole training set in main memory and having only one parameter but being very stable regarding to it. Experimental results on both real and synthetic datasets show that our algorithm is faster than three of the most recent algorithms for building decision trees for large datasets, getting a competitive accuracy.

knowledge discovery and data mining | 2008

Fast k most similar neighbor classifier for mixed data based on approximating and eliminating

Selene Hernández-Rodríguez; J. Ariel Carrasco-Ochoa; J. Fco. Martínez-Trinidad

The k nearest neighbor (k-NN) classifier has been a widely used nonparametric technique in Pattern Recognition. In order to decide the class of a new prototype, the k-NN classifier performs an exhaustive comparison between the prototype to classify (query) and the prototypes in the training set T. However, when T is large, the exhaustive comparison is expensive. To avoid this problem, many fast k-NN algorithms have been developed. Some of these algorithms are based on Approximating-Eliminating search. In this case, the Approximating and Eliminating steps rely on the triangle inequality. However, in soft sciences, the prototypes are usually described by qualitative and quantitative features (mixed data), and sometimes the comparison function does not satisfy the triangle inequality. Therefore, in this work, a fast k most similar neighbour classifier for mixed data (AEMD) is presented. This classifier consists of two phases. In the first phase, a binary similarity matrix among the prototypes in T is stored. In the second phase, new Approximating and Eliminating steps, which are not based on the triangle inequality, are presented. The proposed classifier is compared against other fast k-NN algorithms, which are adapted to work with mixed data. Some experiments with real datasets are presented

iberoamerican congress on pattern recognition | 2006

Document representation based on maximal frequent sequence sets

Edith Hernández-Reyes; J. Fco. Martínez-Trinidad; Jesús Ariel Carrasco-Ochoa; René Arnulfo García-Hernández

In document clustering, documents are commonly represented through the vector space model as a word vector where the features correspond to the words of the documents. However, there are a lot of words in a document set; therefore the vector size could be enormous. Also, the vector space model does not take into account the word order that could be useful to group similar documents. In order to reduce these disadvantages, we propose a new document representation in which each document is represented as a set of its maximal frequent sequences. The proposed document representation is applied for document clustering and the quality of the clustering is evaluated through internal and external measures, the results are compared with those obtained with the vector space model.

intelligent data analysis | 2009

Prototype selection based on sequential search

José Arturo Olvera-López; J. Fco. Martínez-Trinidad; Jesús Ariel Carrasco-Ochoa; Josef Kittler

In this paper, we propose and explore the use of the sequential search for solving the prototype selection problem since this kind of search has shown good performance for solving selection problems. We propose three prototype selection methods based on sequential search. The main goal of our methods is to reduce the training data without losing too much classification accuracy. Experiments and results are reported showing the effectiveness of the proposed methods and comparing their performance against other prototype selection methods.

Explore More