Daniel P. Miranker | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel P. Miranker is active.

Explore More

Publication

Featured researches published by Daniel P. Miranker.

international world wide web conferences | 2012

On directly mapping relational databases to RDF and OWL

Juan F. Sequeda; Marcelo Arenas; Daniel P. Miranker

Mapping relational databases to RDF is a fundamental problem for the development of the Semantic Web. We present a solution, inspired by draft methods defined by the W3C where relational databases are directly mapped to RDF and OWL. Given a relational database schema and its integrity constraints, this direct mapping produces an OWL ontology, which, provides the basis for generating RDF instances. The semantics of this mapping is defined using Datalog. Two fundamental properties are information preservation and query preservation. We prove that our mapping satisfies both conditions, even for relational databases that contain null values. We also consider two desirable properties: monotonicity and semantics preservation. We prove that our mapping is monotone and also prove that no monotone mapping, including ours, is semantic preserving. We realize that monotonicity is an obstacle for semantic preservation and thus present a non-monotone direct mapping that is semantics preserving.

Journal of Web Semantics | 2013

Ultrawrap: SPARQL execution on relational data

Juan F. Sequeda; Daniel P. Miranker

The Semantic Webs promise of web-wide data integration requires the inclusion of legacy relational databases, i.e. the execution of SPARQL queries on RDF representation of the legacy relational data. We explore a hypothesis: existing commercial relational databases already subsume the algorithms and optimizations needed to support effective SPARQL execution on existing relationally stored data. The experiment is embodied in a system, Ultrawrap, that encodes a logical representation of the database as an RDF graph using SQL views and a simple syntactic translation of SPARQL queries to SQL queries on those views. Thus, in the course of executing a SPARQL query, the SQL optimizer uses the SQL views that represent a mapping of relational data to RDF, and optimizes its execution. In contrast, related research is predicated on incorporating optimizing transforms as part of the SPARQL to SQL translation, and/or executing some of the queries outside the underlying SQL environment. Ultrawrap is evaluated using two existing benchmark suites that derive their RDF data from relational data through a Relational Database to RDF (RDB2RDF) Direct Mapping and repeated for each of the three major relational database management systems. Empirical analysis reveals two existing relational query optimizations that, if applied to the SQL produced from a simple syntactic translations of SPARQL queries (with bound predicate arguments) to SQL, consistently yield query execution time that is comparable to that of SQL queries written directly for the relational representation of the data. The analysis further reveals the two optimizations are not uniquely required to achieve a successful wrapper system. The evidence suggests effective wrappers will be those that are designed to complement the optimizer of the target database.

database and expert systems applications | 2008

Translating SQL Applications to the Semantic Web

Syed Hamid Tirmizi; Juan F. Sequeda; Daniel P. Miranker

The content of most Web pages is dynamically derived from an underlying relational database. Thus, the success of the Semantic Web hinges on enabling access to relational databases and their content by semantic methods. We define a system for automatic transformation of SQL DDL schemas into OWL DL ontologies. This system goes further than earlier efforts in that the entire system is expressed in first-order logic. We leverage the formal approach to show the system is complete with respect to a space of the possible relations that can be formed among relational tables as a consequence of primary and foreign key combinations. The full set of transformation rules is stratified, thus the system can be executed directly by a Datalog interpreter.

Bioinformatics | 2009

Integrating shotgun proteomics and mRNA expression data to improve protein identification

Smriti R. Ramakrishnan; Christine Vogel; John T. Prince; Zhihua Li; Luiz O. F. Penalva; Margaret Myers; Edward M. Marcotte; Daniel P. Miranker; Rong Wang

Motivation: Tandem mass spectrometry (MS/MS) offers fast and reliable characterization of complex protein mixtures, but suffers from low sensitivity in protein identification. In a typical shotgun proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other information available, e.g. the probability of a proteins presence is likely to correlate with its mRNA concentration. Results: We develop a Bayesian score that estimates the posterior probability of a proteins presence in the sample given its identification in an MS/MS experiment and its mRNA concentration measured under similar experimental conditions. Our method, MSpresso, substantially increases the number of proteins identified in an MS/MS experiment at the same error rate, e.g. in yeast, MSpresso increases the number of proteins identified by ∼40%. We apply MSpresso to data from different MS/MS instruments, experimental conditions and organisms (Escherichia coli, human), and predict 19–63% more proteins across the different datasets. MSpresso demonstrates that incorporating prior knowledge of protein presence into shotgun proteomics experiments can substantially improve protein identification scores. Availability and Implementation: Software is available upon request from the authors. Mass spectrometry datasets and supplementary information are available from http://www.marcottelab.org/MSpresso/. Contact: [email protected]; [email protected] Supplementary Information: Supplementary data website: http://www.marcottelab.org/MSpresso/.

TREAT#R##N#A New and Efficient Match Algorithm for AI Production System | 1990

Conclusions and Future Research

Daniel P. Miranker

The TREAT algorithm for production system execution draws upon a new source of state information for production system algorithms, conflict-set support. Empirical study of several OPS5 programs has shown that TREAT performs 30 percent to 50 percent less comparisons when searching for variable bindings than the RETE match. The analysis of several TREAT-OPS5 implementations, both sequential and parallel, shows that there is little rule parallelism available in OPS5 programs. The PM-level distribution of rules in a DADO machine is a natural way to organize rule-based systems. Despite the coarse granularity of the rule-level parallelism, a DADO machine may be partitioned accordingly. The subtrees below the PM-level add additional parallelism by permitting the parallel access and parallel matching of the working memory relevant to a particular rule. For the implementation of PM-level distributed production systems, the use of the SIMD metaphor outperforms asynchronous message passing, yet the implementation is simpler. A coarser grain DADO—one built using a smaller number of currently available 32 bit processor chips—would perform comparably with suggested production system machine architecture but would be considerably simpler and less expensive.

Knowledge Engineering Review | 2011

Review: survey of directly mapping sql databases to the semantic web

Juan F. Sequeda; Syed Hamid Tirmizi; Oscar Corcho; Daniel P. Miranker

The Semantic Web anticipates integrated access to a large number of information sources on the Internet represented as Resource Description Framework (RDF). Given the large number of websites that are backed by SQL databases, methods that automate the translation of those databases to RDF are crucial. One approach, taken by a number of researchers, is to directly map the SQL schema to an equivalent Web Ontology Language (OWL) or RDF Schema representation, which in turn, implies an RDF representation for the relational data. This paper reviews this research, and derives a consolidated, overarching set of translation rules expressible as a stratified Datalog program. We present all the possible key combinations in an SQL schema and consider their implied semantic properties. We review the approaches and characterize them with respect to the scope of their coverage of SQL constructs.

Bioinformatics | 2004

A metric model of amino acid substitution

Weijia Xu; Daniel P. Miranker

MOTIVATION We address the question of whether there exists an effective evolutionary model of amino-acid substitution that forms a metric-distance function. There is always a trade-off between speed and sensitivity among competing computational methods of determining sequence homology. A metric model of evolution is a prerequisite for the development of an entire class of fast sequence analysis algorithms that are both scalable, O(log n) and sensitive. RESULTS We have reworked the mathematics of the point accepted mutation model (PAM) by calculating the expected time between accepted mutations in lieu of calculating log-odds probabilities. The resulting substitution matrix (mPAM) forms a metric. We validate the application of the mPAM evolutionary model for sequence homology by executing sequence queries from a controlled yeast protein homology search benchmark. We compare the accuracy of the results of mPAM and PAM similarity matrices as well as three prior metric models. The experiment shows that mPAM significantly outperforms the other three metrics and sufficiently approaches the sensitivity of PAM250 to make it applicable to the management of protein sequence databases.

international conference on management of data | 1993

Index support for rule activation

David A. Brant; Daniel P. Miranker

Integrated rule and database systems are quickly moving from the research laboratory into commercial systems. However, the current generation of prototypes are designed to work with small rule sets involving limited inferencing. The problem of supporting large complex rule programs within database management systems still presents significant challenges. The basis for many of these challenges is providing support for rule activation. Rule activation is defined as the process of determining which rules are satisfied and what data satisfies them. In this paper we present performance results for the DATEX database rule system and its novel indexing technique for supporting rule activation. Our approach assumes that both the rule program and the database must be optimized synergistically. However, as an experimental result we have determined that DATEX requires very few changes to a standard DBMS environment, and we argue that these changes are reasonable for the problems being solved. Based on the performance of DATEX we believe we have demonstrated a satisfactory solution to the rule activation problem for complex rule programs operating within a database system.

Journal of Parallel and Distributed Computing | 1986

The DADO production system machine

Salvatore J. Stolfo; Daniel P. Miranker

Abstract DADO is a parallel, tree-structured machine designed to provide significant performance improvements in the execution of large expert systems implemented in production system form. A full-scale version of the DADO machine would comprise a large set of processing elements (PEs) (on the order of thousands), each containing its own processor, a small amount (20K bytes, in the current prototype design) of local random access memory, and a specialized I/O switch. The PEs are interconnected to form a complete binary tree. This paper describes the application domain of the DADO machine and the rationale for its design. Parallel algorithms for production system execution are briefly described. We then focus on the machine architecture and detail the hardware design of a moderately large prototype comprising 1023 microprocessors currently operational at Columbia University. We conclude with very encouraging performance statistics recently calculated from an analysis of simulations of the system.

Bioinformatics | 2009

Mining gene functional networks to improve mass-spectrometry-based protein identification

Smriti R. Ramakrishnan; Christine Vogel; Taejoon Kwon; Luiz O. F. Penalva; Edward M. Marcotte; Daniel P. Miranker

Motivation: High-throughput protein identification experiments based on tandem mass spectrometry (MS/MS) often suffer from low sensitivity and low-confidence protein identifications. In a typical shotgun proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other evidence to suggest that a protein is present and confidence in individual protein identification can be updated accordingly. Results: We develop a method that analyzes MS/MS experiments in the larger context of the biological processes active in a cell. Our method, MSNet, improves protein identification in shotgun proteomics experiments by considering information on functional associations from a gene functional network. MSNet substantially increases the number of proteins identified in the sample at a given error rate. We identify 8–29% more proteins than the original MS experiment when applied to yeast grown in different experimental conditions analyzed on different MS/MS instruments, and 37% more proteins in a human sample. We validate up to 94% of our identifications in yeast by presence in ground-truth reference sets. Availability and Implementation: Software and datasets are available at http://aug.csres.utexas.edu/msnet Contact: [email protected], [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Explore More