Junior Barrera | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Junior Barrera is active.

Explore More

Publication

Featured researches published by Junior Barrera.

Journal of Computational Biology | 2002

Inference from Clustering with Application to Gene-Expression Microarrays

Edward R. Dougherty; Junior Barrera; Marcel Brun; Seungchan Kim; Roberto M. Cesar; Yidong Chen; Michael L. Bittner; Jeffrey M. Trent

There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes. The inference is that clusters represent a partition of the sample points according to which process they belong. This paper discusses a model-based clustering toolbox that evaluates cluster accuracy. Each random process is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random processes. Various clustering algorithms are evaluated based on process variance and the key issue of the rate at which algorithmic performance improves with increasing numbers of experimental replications. The model means can be selected by hand to test the separability of expected types of biological expression patterns. Alternatively, the model can be seeded by real data to test the expected precision of that output or the extent of improvement in precision that replication could provide. In the latter case, a clustering algorithm is used to form clusters, and the model is seeded with the means and variances of these clusters. Other algorithms are then tested relative to the seeding algorithm. Results are averaged over various seeds. Output includes error tables and graphs, confusion matrices, principal-component plots, and validation measures. Five algorithms are studied in detail: K-means, fuzzy C-means, self-organizing maps, hierarchical Euclidean-distance-based and correlation-based clustering. The toolbox is applied to gene-expression clustering based on cDNA microarrays using real data. Expression profile graphics are generated and error analysis is displayed within the context of these profile graphics. A large amount of generated output is available over the web.

Siam Journal on Applied Mathematics | 1991

Minimal representations for translation-invariant set mappings by mathematical morphology

Gerald Jean Francis Banon; Junior Barrera

In his 1975 book, Matheron introduced a pair of dual representations, written in terms of erosions and dilations, for increasing translation invariant set mappings, using the concept of a kernel. Based on hit-miss topology, Maragos, in his 1985 Ph.D. thesis, has given sufficient conditions under which the increasing mappings have minimal representations. In this paper, a pair of dual representations for translation-invariant set mappings (not necessarily increasing) is presented. It is shown that under the same sufficient conditions such mappings have minimal representations. Actually, the representations of Matheron and Maragos are special cases of the proposed ones. Finally, some examples are given to illustrate the theory.

Signal Processing | 1993

Decomposition of mappings between complete lattices by mathematical morphology, part I.: General lattices

Gerald Jean Francis Banon; Junior Barrera

Abstract Two canonical decompositions of mappings between complete lattices are presented. These decompositions are based on the mathematical morphology elementary mappings: erosions, anti-erosions, dilations and anti-dilations. The proposed decompositions are obtained by introducing the concept of morphological connection, that extends the notion of Galois connection. The definitions of sup-generating mapping, kernel and basis within the framework of complete lattices are given. The decompositions are built by analysing the kernel and may be simplified from the basis. The results are specialized to the cases of inf-separable, increasing and decreasing mappings. The presented decompositions are dual. Some examples, including the case of boolean functions simplification, illustrate the key concepts and the decomposition rule.

Journal of Computational Biology | 2002

Strong Feature Sets from Small Samples

Seungchan Kim; Edward R. Dougherty; Junior Barrera; Yidong Chen; Michael L. Bittner; Jeffrey M. Trent

For small samples, classifier design algorithms typically suffer from overfitting. Given a set of features, a classifier must be designed and its error estimated. For small samples, an error estimator may be unbiased but, owing to a large variance, often give very optimistic estimates. This paper proposes mitigating the small-sample problem by designing classifiers from a probability distribution resulting from spreading the mass of the sample points to make classification more difficult, while maintaining sample geometry. The algorithm is parameterized by the variance of the spreading distribution. By increasing the spread, the algorithm finds gene sets whose classification accuracy remains strong relative to greater spreading of the sample. The error gives a measure of the strength of the feature set as a function of the spread. The algorithm yields feature sets that can distinguish the two classes, not only for the sample data, but for distributions spread beyond the sample data. For linear classifiers, the topic of the present paper, the classifiers are derived analytically from the model, thereby providing an enormous savings in computation time. The algorithm is applied to cancer classification via cDNA microarrays. In particular, the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the algorithm is used to find gene sets whose expressions can be used to classify BRCA1 and BRCA2 tumors.

Journal of Electronic Imaging | 1997

Automatic programming of binary morphological machines by design of statistically optimal operators in the context of computational learning theory

Junior Barrera; Edward R. Dougherty; Nina Sumiko Tomita

Representation of set operators by artificial neural networks and design of such operators by inference of network parameters is a popular technique in binary image analysis. We propose an alternative to this technique: automatic programming of morphological machines (MMachs) by the design of statistically optimal operators. We propose a formulation of the procedure for designing set operators that extends the one stated by Dougherty for binary image restoration, show the relation of this new formulation with the one stated by Haussler for learning Boolean concepts in the context of machine learning theory (which usually is applied to neural networks), present a new learning algorithm for Boolean concepts represented as MMach programs, and give some application examples in binary image analysis.

brazilian symposium on computer graphics and image processing | 2001

Microarray gridding by mathematical morphology

Roberto Hirata; Junior Barrera; Ronaldo Fumio Hashimoto; Daniel O. Dantas

DNA chips (i.e., microarrays) biotechnology is a hybridization (i.e., DNA matching) based process that makes it possible to quantify the relative abundance of mRNA from two distinct samples by analysing their fluorescence signals. This technique requires robotic placement (i.e., spotting) of thousands of cDNAs (i.e., complementary DNA) in an array format on glass microscope slides which provide gene-specific hybridization targets. The two different samples of mRNA, usually labeled with Cy3 and Cy5 fluorochromes, are cohybridized onto each spotted gene and two digital images, one for each fluorochrome, are acquired after hybridization. Before estimating the signal and background of each spot, it is necessary to locate the region of the spot in order to map the gene information with the corresponding spot. Therefore, these images must be segmented for analysis, that is, the spotting geometric structure must be found. That implies segmenting the subarrays (i.e., the set of grouped spots), and then the positions of the spots in each subarray. The authors introduce a new technique using morphological operators that performs automatic gridding procedures (i.e., subarrays and spot segmentation). This technique has been implemented and tested in a variety of microarray images with success.

Archive | 2007

Constructing Probabilistic Genetic Networks of Plasmodium falciparum from Dynamical Expression Signals of the Intraerythrocytic Development Cycle

Junior Barrera; Roberto M. Cesar; David Correa Martins; Ricardo Z. N. Vêncio; Emilio F. Merino; Marcio Yamamoto; Florencia Leonardi; Carlos Alberto Pereira; Hernando A. del Portillo

The completion of the genome sequence of Plasmodium falciparum revealed that close to 60% of the annotated genome corresponds to hypothetical proteins and that many genes, whose metabolic pathways or biological products are known, have not been predicted from sequence similarity searches. Recently, using global gene expression of the asexual blood stages of P. falciparum at 1 h resolution scale and Discrete Fourier Transform based techniques, it has been demonstrated that many genes are regulated in a single periodic manner during the asexual blood stages. Moreover, by ordering the genes according to the phase of expression, a new list of targets for vaccine and drug development was generated. In the present paper, genes are annotated under a different perspective: a list of functional properties is attributed to networks of genes representing subsystems of the P. falciparum regulatory expression system. The model developed to represent genetic networks, called Probabilistic Genetic Network (PGN), is a Markov chain with some additional properties. This model mimics the properties of a gene as a non-linear stochastic gate and the systems are built by coupling of these gates. Moreover, a tool that integrates mining of dynamical expression signals by PGN design techniques, different databases and biological knowledge, was developed. The applicability of this tool for discovering gene networks of the malaria expression regulation system has been validated using the glycolytic pathway as a “gold-standard”, as well as by creating an apicoplast PGN network. Presently, we are tentatively improving the network design technique before trying to validate results from the apicoplast PGN network through reverse genetics approaches.

Real-time Imaging | 2002

Segmentation of microarray images by mathematical morphology

Roberto Hirata; Junior Barrera; Ronaldo Fumio Hashimoto; Daniel O. Dantas; Gustavo H. Esteves

DNA chips (i.e., microarrays) biotechnology is a hybridization (i.e., matching of pairs of DNA)-based process that makes possible to quantify the relative abundance of mRNA of two distinct samples by analyzing their fluorescence signals. This technique requires robotic placement (i.e., spotting) of thousands of cDNAs (i.e., complementary DNA) in an array format on glass microscope slides. The spotted cDNAs are the hybridization targets for the mRNA samples. The two different samples of mRNA, usually labeled with Cy3 and Cy5 fluorochromes, are cohybridized onto each spotted gene. After hybridization, one digital image is acquired for each fluorochrome wavelength. Then, it is necessary to recognize each gene by its position in the array and to estimate its signal (i.e., hybridization information). For that, it is necessary to segment the image in three classes of objects: subarrays (i.e., set of grouped spots), spot box (i.e., the rectangular neighborhood that contains a spot) and spot (i.e., region of the image where there exists signal). In this paper, we present a technique based on mathematical morphology that performs this segmentation. In the website http://www.vision.ime.usp.br/demos/ microarray/detailed experimental results are presented.

Fundamenta Informaticae | 2000

Automatic Programming of Morphological Machines by PAC Learning

Junior Barrera; Routo Terada; Roberto Hirata; Nina S. T. Hirata

An important aspect of mathematical morphology is the description of complete lattice operators by a formal language, the Morphological Language (ML), whose vocabulary is composed of infimum, supremum, dilations, erosions, anti-dilations and anti-erosions. This language is complete (i.e., it can represent any complete lattice operator) and expressive (i.e., many useful operators can be represented as phrases with relatively few words). Since the sixties special machines, the Morphological Machines (MMachs), have been built to implement the ML restricted to the lattices of binary and gray-scale images. However, designing useful MMach programs is not an elementary task. Recently, much research effort has been addressed to automate the programming of MMachs. The goal of the different approaches for this problem is to find suitable knowledge representation formalisms to describe transformations over geometric structures and to translate them automatically into MMach programs by computational systems. We present here the central ideas of an approach based on the representation of transformations by collections of observed-ideal pairs of images and the estimation of suitable operators from these data. In this approach, the estimation of operators is based on statistical optimization or, equivalently, on a branch of Machine Learning Theory known as PAC Learning. These operators are generated as standard form morphological operators that may be simplified (i.e., transformed into equivalent morphological operators that use fewer vocabulary words) by syntactical transformations.

Information Sciences | 2014

A feature selection technique for inference of graphs from their known topological properties: Revealing scale-free gene regulatory networks

Fabrício Martins Lopes; David Correa Martins; Junior Barrera; Roberto M. Cesar

Abstract An important problem in bioinformatics is the inference of gene regulatory networks (GRNs) from expression profiles. In general, the main limitations faced by GRN inference methods are the small number of samples with huge dimensionalities and the noisy nature of the expression measurements. Alternatives are thus needed to obtain better accuracy for the GRNs inference problem. Many pattern recognition techniques rely on prior knowledge about the problem in addition to the training data to gain statistical estimation power. This work addresses the GRN inference problem by modeling prior knowledge about the network topology. The main contribution of this paper is a novel methodology that aggregates scale-free properties to a classical low-cost feature selection method, known as Sequential Floating Forward Selection (SFFS), for guiding the inference task. Such methodology explores the search space iteratively by applying a scale-free property to reduce the search space. In this way, the search space traversed by the method integrates the exploration of all combinations of predictors set when the number of combinations is small (dimensionality 〈 k 〉 ⩽ 2 ) with a floating search when the number of combinations becomes explosive (dimensionality 〈 k 〉 ⩾ 3 ). This process is guided by scale-free prior information. Experimental results using synthetic and real data show that this technique provides smaller estimation errors than those obtained without guiding the SFFS application by the scale-free model, thus maintaining the robustness of the SFFS method. Therefore, we show that the proposed framework may be applied in combination with other existing GRN inference methods to improve the prediction accuracy of networks with scale-free properties.

Explore More