Adrian Silvescu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Adrian Silvescu is active.

Explore More

Publication

Featured researches published by Adrian Silvescu.

BMC Bioinformatics | 2007

Glycosylation site prediction using ensembles of Support Vector Machine classifiers

Cornelia Caragea; Jivko Sinapov; Adrian Silvescu; Drena Dobbs; Vasant G. Honavar

BackgroundGlycosylation is one of the most complex post-translational modifications (PTMs) of proteins in eukaryotic cells. Glycosylation plays an important role in biological processes ranging from protein folding and subcellular localization, to ligand recognition and cell-cell interactions. Experimental identification of glycosylation sites is expensive and laborious. Hence, there is significant interest in the development of computational methods for reliable prediction of glycosylation sites from amino acid sequences.ResultsWe explore machine learning methods for training classifiers to predict the amino acid residues that are likely to be glycosylated using information derived from the target amino acid residue and its sequence neighbors. We compare the performance of Support Vector Machine classifiers and ensembles of Support Vector Machine classifiers trained on a dataset of experimentally determined N-linked, O-linked, and C-linked glycosylation sites extracted from O-GlycBase version 6.00, a database of 242 proteins from several different species. The results of our experiments show that the ensembles of Support Vector Machine classifiers outperform single Support Vector Machine classifiers on the problem of predicting glycosylation sites in terms of a range of standard measures for comparing the performance of classifiers. The resulting methods have been implemented in EnsembleGly, a web server for glycosylation site prediction.ConclusionEnsembles of Support Vector Machine classifiers offer an accurate and reliable approach to automated identification of putative glycosylation sites in glycoprotein sequences.

hybrid intelligent systems | 2004

A Framework for Learning from Distributed Data Using Sufficient Statistics and Its Application to Learning Decision Trees

Doina Caragea; Adrian Silvescu; Vasant G. Honavar

This paper motivates and precisely formulates the problem of learning from distributed data; describes a general strategy for transforming traditional machine learning algorithms into algorithms for learning from distributed data; demonstrates the application of this strategy to devise algorithms for decision tree induction from distributed data; and identifies the conditions under which the algorithms in the distributed setting are superior to their centralized counterparts in terms of time and communication complexity; The resulting algorithms are provably exact in that the decision tree constructed from distributed data is identical to that obtained in the centralized setting. Some natural extensions leading to algorithms for learning from heterogeneous distributed data and learning under privacy constraints are outlined.

symposium on abstraction, reformulation and approximation | 2002

Ontology-Driven Induction of Decision Trees at Multiple Levels of Abstraction

Jun Zhang; Adrian Silvescu; Vasant G. Honavar

Most learning algorithms for data-driven induction of pattern classifiers (e.g., the decision tree algorithm), typically represent input patterns at a single level of abstraction - usually in the form of an ordered tuple of attribute values. However, in many applications of inductive learning - e.g., scientific discovery, users often need to explore a data set at multiple levels of abstraction, and from different points of view. Each point of view corresponds to a set of ontological (and representational) commitments regarding the domain of interest. The choice of an ontology induces a set of representatios of the data and a set of transformations of the hypothesis space. This paper formalizes the problem of inductive learning using ontologies and data; describes an ontology-driven decision tree learning algorithm to learn classification rules at multiple levels of abstraction; and presents preliminary results to demonstrate the feasibility of the proposed approach.

workshop on mobile computing systems and applications | 2003

Information extraction and integration from heterogeneous, distributed, autonomous information sources - a federated ontology-driven query-centric approach

Jaime A Reinoso Castillo; Adrian Silvescu; Doina Caragea; Jyotishman Pathak; Vasant G. Honavar

This paper motivates and describes the data integration component of INDUS (intelligent data understanding system) environment for data-driven information extraction and integration from heterogeneous, distributed, autonomous information sources. The design of INDUS is motivated by the requirements of applications such as scientific discovery, in which it is desirable for users to be able to access, flexibly interpret, and analyze data from diverse sources from different perspectives in different contexts. INDUS implements a federated, query-centric approach to data integration using user-specified ontologies.

international conference on data mining | 2005

Discriminatively trained Markov model for sequence classification

Oksana Yakhnenko; Adrian Silvescu; Vasant G. Honavar

In this paper, we propose a discriminative counterpart of the directed Markov Models of order k - 1, or MM(k - 1) for sequence classification. MM(k - 1) models capture dependencies among neighboring elements of a sequence. The parameters of the classifiers are initialized to based on the maximum likelihood estimates for their generative counterparts. We derive gradient based update equations for the parameters of the sequence classifiers in order to maximize the conditional likelihood function. Results of our experiments with data sets drawn from biological sequence classification (specifically protein function and subcellular localization) and text classification applications show that the discriminatively trained sequence classifiers outperform their generative counterparts, confirming the benefits of discriminative training when the primary objective is classification. Our experiments also show that the discriminatively trained MM(k - 1) sequence classifiers are competitive with the computationally much more expensive Support Vector Machines trained using k-gram representations of sequences.

Knowledge and Information Systems | 2006

Learning accurate and concise naïve Bayes classifiers from attribute value taxonomies and data

Jun Zhang; Dae-Ki Kang; Adrian Silvescu; Vasant G. Honavar

In many application domains, there is a need for learning algorithms that can effectively exploit attribute value taxonomies (AVT)—hierarchical groupings of attribute values—to learn compact, comprehensible and accurate classifiers from data—including data that are partially specified. This paper describes AVT-NBL, a natural generalization of the naïve Bayes learner (NBL), for learning classifiers from AVT and data. Our experimental results show that AVT-NBL is able to generate classifiers that are substantially more compact and more accurate than those produced by NBL on a broad range of data sets with different percentages of partially specified values. We also show that AVT-NBL is more efficient in its use of training data: AVT-NBL produces classifiers that outperform those produced by NBL using substantially fewer training examples.

intelligent systems design and applications | 2003

Decision Tree Induction from Distributed Heterogeneous Autonomous Data Sources

Doina Caragea; Adrian Silvescu; Vasant G. Honavar

With the growing use of distributed information networks, there is an increasing need for algorithmic and system solutions for data-driven knowledge acquisition using distributed, heterogeneous and autonomous data repositories. In many applications, practical constraints require such systems to provide support for data analysis where the data and the computational resources are available. This presents us with distributed learning problems. We precisely formulate a class of distributed learning problems; present a general strategy for transforming traditional machine learning algorithms into distributed learning algorithms; and demonstrate the application of this strategy to devise algorithms for decision tree induction (using a variety of splitting criteria) from distributed data. The resulting algorithms are provably exact in that the decision tree constructed from distributed data is identical to that obtained by the corresponding algorithm when in the batch setting. The distributed decision tree induction algorithms have been implemented as part of INDUS, an agent-based system for data-driven knowledge acquisition from heterogeneous, distributed, autonomous data sources.

acm/ieee joint conference on digital libraries | 2013

Can't see the forest for the trees?: a citation recommendation system

Cornelia Caragea; Adrian Silvescu; Prasenjit Mitra; C. Lee Giles

Scientists continue to find challenges in the ever increasing amount of information that has been produced on a world wide scale, during the last decades. When writing a paper, an author searches for the most relevant citations that started or were the foundation of a particular topic, which would very likely explain the thinking or algorithms that are employed. The search is usually done using specific keywords submitted to literature search engines such as Google Scholar and CiteSeer. However, finding relevant citations is distinctive from producing articles that are only topically similar to an authors proposal. In this paper, we address the problem of citation recommendation using a singular value decomposition approach. The models are trained and evaluated on the Citeseer digital library. The results of our experiments show that the proposed approach achieves significant success when compared with collaborative filtering methods on the citation recommendation task.

data integration in the life sciences | 2005

Information integration and knowledge acquisition from semantically heterogeneous biological data sources

Doina Caragea; Jyotishman Pathak; Jie Bao; Adrian Silvescu; Carson M. Andorf; Drena Dobbs; Vasant G. Honavar

We present INDUS (Intelligent Data Understanding System), a federated, query-centric system for knowledge acquisition from autonomous, distributed, semantically heterogeneous data sources that can be viewed (conceptually) as tables. INDUS employs ontologies and inter-ontology mappings, to enable a user or an application to view a collection of such data sources (regardless of location, internal structure and query interfaces) as though they were a collection of tables structured according to an ontology supplied by the user. This allows INDUS to answer user queries against distributed, semantically heterogeneous data sources without the need for a centralized data warehouse or a common global ontology. We used INDUS framework to design algorithms for learning probabilistic models (e.g., Naive Bayes models) for predicting GO functional classification of a protein based on training sequences that are distributed among SWISSPROT and MIPS data sources. Mappings such as EC2GO and MIPS2GO were used to resolve the semantic differences between these data sources when answering queries posed by the learning algorithms. Our results show that INDUS can be successfully used for integrative analysis of data from multiple sources needed for collaborative discovery in computational biology.

Lecture Notes in Computer Science | 2001

Analysis and Synthesis of Agents That Learn from Distributed Dynamic Data Sources

Doina Caragea; Adrian Silvescu; Vasant G. Honavar

We propose a theoretical framework for specification and analysis of a class of learning problems that arise in open-ended environments that contain multiple, distributed, dynamic data and knowledge sources. We introduce a family of learning operators for precise specification of some existing solutions and to facilitate the design and analysis of new algorithms for this class of problems. We state some properties of instance and hypothesis representations, and learning operators that make exact learning possible in some settings. We also explore some relationships between models of learning using different subsets of the proposed operators under certain assumptions.

Explore More