Marco A. Alvarez
Utah State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marco A. Alvarez.
international conference on semantic computing | 2007
Marco A. Alvarez; SeungJin Lim
The problem of measuring the semantic similarity between pairs of words has been considered a fundamental operation in data mining and information retrieval. Nevertheless, developing a computational method capable of generating satisfactory results close to what humans would perceive is still a difficult task somewhat owed to the subjective nature of similarity. In this paper, it is presented a novel algorithm for scoring the semantic similarity (SSA) between words. Given two input words w1and w2, SSA exploits their corresponding concepts, relationships, and descriptive glosses available in WordNet in order to build a rooted weighted graph Gsim. The output score is calculated by exploring the concepts present in Gsim and selecting the minimal distance between any two concepts c1 and c2 of w1 and w2 respectively. The definition of distance is a combination of: 1) the depth of the nearest common ancestor between c1 and c2 in Gsim, 2) the intersection of the descriptive glosses of c1 and c2, and 3) the shortest distance between c1 and c2 in Gsim. A correlation of 0.913 has been achieved between the results by SSA and the human ratings reported by Miller and Charles (1991) for a dataset of 28 pairs of nouns. Furthermore, using the full dataset of 65 pairs presented by Rubenstein and Goodenough (1965), the correlation between SSA results and the known human ratings is 0.903, which is higher than all other reported algorithms for the same dataset. The high correlations of SSA with human ratings suggest that SSA would be convenient in solving several data mining and information retrieval problems.
Journal of Bioinformatics and Computational Biology | 2011
Marco A. Alvarez; Changhui Yan
Existing methods for calculating semantic similarities between pairs of Gene Ontology (GO) terms and gene products often rely on external databases like Gene Ontology Annotation (GOA) that annotate gene products using the GO terms. This dependency leads to some limitations in real applications. Here, we present a semantic similarity algorithm (SSA), that relies exclusively on the GO. When calculating the semantic similarity between a pair of input GO terms, SSA takes into account the shortest path between them, the depth of their nearest common ancestor, and a novel similarity score calculated between the definitions of the involved GO terms. In our work, we use SSA to calculate semantic similarities between pairs of proteins by combining pairwise semantic similarities between the GO terms that annotate the involved proteins. The reliability of SSA was evaluated by comparing the resulting semantic similarities between proteins with the functional similarities between proteins derived from expert annotations or sequence similarity. Comparisons with existing state-of-the-art methods showed that SSA is highly competitive with the other methods. SSA provides a reliable measure for semantics similarity independent of external databases of functional-annotation observations.
IEEE Transactions on Learning Technologies | 2010
Brett E. Shelton; Jon Scoresby; Tim Stowell; Michael R. Capell; Marco A. Alvarez; K C Coats
Using open source components to assemble a working 3D game engine is an attractive alternative to purchasing off-the-shelf technology. A student development team can use many different resources to investigate what underlying mechanisms are needed to build virtual environments. However, the techniques and processes involved when using open source components offer unique insights and educational opportunities. Leveraging and modifying existing software, and participating in the open source community, may alter the perspective of how game engines can be created. In this work, the process of building a simulation 3D game engine to support a training application for emergency response personnel is discussed. Evidence is presented that researching, gathering, and assembling open source components to build an open educational resource (OER), in this case a virtual 3D application, holds educational value. The research focuses on students whose interests cross disciplines of computer science, educational technology, instructional design, and game design.
Computational Biology and Chemistry | 2012
Marco A. Alvarez; Changhui Yan
As several structural proteomic projects are producing an increasing number of protein structures with unknown function, methods that can reliably predict protein functions from protein structures are in urgent need. In this paper, we present a method to explore the clustering patterns of amino acids on the 3-dimensional space for protein function prediction. First, amino acid residues on a protein structure are clustered into spatial groups using hierarchical agglomerative clustering, based on the distance between them. Second, the protein structure is represented using a graph, where each node denotes a cluster of amino acids. The nodes are labeled with an evolutionary profile derived from the multiple alignment of homologous sequences. Then, a shortest-path graph kernel is used to calculate similarities between the graphs. Finally, a support vector machine using this graph kernel is used to train classifiers for protein function prediction. We applied the proposed method to two separate problems, namely, prediction of enzymes and prediction of DNA-binding proteins. In both cases, the results showed that the proposed method outperformed other state-of-the-art methods.
technical symposium on computer science education | 2008
Marco A. Alvarez; José A. Baiocchi; José Antonio Pow Sang
In Peru, the computing discipline as an academic field has been neglected during decades deriving in a low-quality higher education system and consequently shaping an academic community with modest participation within scientific production around the world. At the undergraduate level, universities have not adopted international standards or curricula recommendations that could contribute to improve the quality of computing and related engineering programs. Considering this context, the present document aims to present an overview of the current situation followed by suggestions toward the empowering of the field following international well-adopted practices and respecting local characteristics.
pacific-rim symposium on image and video technology | 2007
Roberto Viana; Ricardo B. Rodrigues; Marco A. Alvarez; Hemerson Pistori
The performance of Support Vector Machines, as many other machine learning algorithms, is very sensitive to parameter tuning,mainly in real world problems. In this paper, two well known and widely used SVM implementations, Weka SMO and LIBSVM, were compared using Simulated Annealing as a parameter tuner. This approach increased significantly the classification accuracy over the Weka SMO and LIBSVM standard configuration. The paper also presents an empirical evaluation of SVM against AdaBoost and MLP, for solving the leather defect classification problem. The results obtained are very promising in successfully discriminating leather defects, with the highest overall accuracy, of 99.59%, being achieved by LIBSVM tuned with Simulated Annealing.
computational intelligence in bioinformatics and computational biology | 2010
Marco A. Alvarez; Changhui Yan
Computational methods play an important role in investigating the relationships between protein structure and function. In this study, we evaluate different graph representations of protein structures for kernel-based protein function prediction. We use shortest path graph kernels and support vector machines to predict whether a protein is an enzyme or not. We present three different and straightforward strategies for modeling protein structures. Accuracy averages for 10-fold cross-validation range from 84.31% to 86.97% for different modeling strategies, outperforming state-of-the-art work.
international conference on cyber security and cloud computing | 2016
Lifan Xu; Dong Ping Zhang; Marco A. Alvarez; Jose Andre Morales; Xudong Ma; John Cavazos
Malware classification for the Android ecosystem can be performed using a range of techniques. One major technique that has been gaining ground recently is dynamic analysis based on system call invocations recorded during the executions of Android applications. Dynamic analysis has traditionally been based on converting system calls into flat feature vectors and feeding the vectors into machine learning algorithms for classification. In this paper, we implement three traditional feature-vector-based representations for Android system calls. For each feature vector representation, we also propose a novel graph-based representation. We then use graph kernels to compute pair-wise similarities and feed these similarity measures into a Support Vector Machine (SVM) for classification. To speed up the graph kernel computation, we compress the graphs using the Compressed Row Storage format, and then we apply OpenMP to parallelize the computation. Experiments show that the graph-based representations are able to improve the classification accuracy over the corresponding feature-vector-based representations from the same input. Finally we show that different representations can be combined together to further improve classification accuracy.
international conference on digital information management | 2007
Marco A. Alvarez; SeungJin Lim
This paper presents a solution for the problem of finding interchangeable words in the context of an input collection of strings. Interchangeable words are words that can be replaced indistinctly in phrases or free text without deviating its actual meaning. Under restricted conditions, pairs of interchangeable might be useful for data deduplication, copy detection, software localization, among others. The calculation of the degree of interchangeability involves the accurate calculation of semantic similarity between pairs of words and the search for candidate pairs in the overall search space imposed by the input collection. The solution presented in this paper is composed by a search method for candidate pairs using the Levenshtein distance algorithm and a novel algorithm - SSA -for calculating the semantic similarity between words. The proposed solution was implemented and tested within a real world application related to a string message database from a software development company. The system was used to build an ontology with clusters of interchangeable words.
Journal of Biomedical Semantics | 2011
Marco A. Alvarez; Xiaojun Qi; Changhui Yan