Neeraj Koul | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Neeraj Koul is active.

Explore More

Publication

Featured researches published by Neeraj Koul.

web information and data management | 2005

A framework for semantic web services discovery

Jyotishman Pathak; Neeraj Koul; Doina Caragea; Vasant G. Honavar

This paper describes a framework for ontology-based flexible discovery of Semantic Web services. The proposed approach relies on user-supplied, context-specific mappings from an user ontology to relevant domain ontologies used to specify Web services. We show how a users query for a Web service that meets certain selection criteria can be transformed into queries that can be processed by a matchmaking engine that is aware of the relevant domain ontologies and Web services. We also describe how user-specified preferences for Web services in terms of non-functional requirements (e.g., QoS) can be incorporated into the Web service discovery mechanism to generate a partially ordered list of services that meet user-specified functional requirements.

Mammalian Genome | 2009

ANEXdb: an integrated animal ANnotation and microarray EXpression database

Oliver P. Couture; Keith M. Callenberg; Neeraj Koul; Sushain Pandit; Remy Younes; Zhi-Liang Hu; Jack C. M. Dekkers; James M. Reecy; Vasant G. Honavar; Christopher K. Tuggle

To determine annotations of the sequence elements on microarrays used for transcriptional profiling experiments in livestock species, currently researchers must either use the sparse direct annotations available for these species or create their own annotations. ANEXdb (http://www.anexdb.org) is an open-source web application that supports integrated access of two databases that house microarray expression (ExpressDB) and EST annotation (AnnotDB) data. The expression database currently supports storage and querying of Affymetrix-based expression data as well as retrieval of experiments in a form ready for NCBI-GEO submission; these services are available online. AnnotDB currently houses a novel assembly of approximately 1.6 million unique porcine-expressed sequence reads called the Iowa Porcine Assembly (IPA), which consists of 140,087 consensus sequences, the Iowa Tentative Consensus (ITC) sequences, and 103,888 singletons. The IPA has been annotated via transfer of information from homologs identified through sequence alignment to NCBI RefSeq. These annotated sequences have been mapped to the Affymetrix porcine array elements, providing annotation for 22,569 of the 23,937 (94%) porcine-specific probe sets, of which 19,253 (80%) are linked to an NCBI RefSeq entry. The ITC has also been mined for sequence variation, providing evidence for up to 202,383 SNPs, 62,048 deletions, and 958 insertions in porcine-expressed sequence. These results create a single location to obtain porcine annotation of and sequence variation in differently expressed genes in expression experiments, thus permitting possible identification of causal variants in such genes of interest. The ANEXdb application is open source and available from SourceForge.net.

international semantic web conference | 2011

Learning relational bayesian classifiers from RDF data

Harris T. Lin; Neeraj Koul; Vasant G. Honavar

The increasing availability of large RDF datasets offers an exciting opportunity to use such data to build predictive models using machine learning algorithms. However, the massive size and distributed nature of RDF data calls for approaches to learning from RDF data in a setting where the data can be accessed only through a query interface, e.g., the SPARQL endpoint of the RDF store. In applications where the data are subject to frequent updates, there is a need for algorithms that allow the predictive model to be incrementally updated in response to changes in the data. Furthermore, in some applications, the attributes that are relevant for specific prediction tasks are not known a priori and hence need to be discovered by the algorithm. We present an approach to learning Relational Bayesian Classifiers (RBCs) from RDF data that addresses such scenarios. Specifically, we show how to build RBCs from RDF data using statistical queries through the SPARQL endpoint of the RDF store. We compare the communication complexity of our algorithm with one that requires direct centralized access to the data and hence has to retrieve the entire RDF dataset from the remote location for processing. We establish the conditions under which the RBC models can be incrementally updated in response to addition or deletion of RDF data. We show how our approach can be extended to the setting where the attributes that are relevant for prediction are not known a priori, by selectively crawling the RDF data for attributes of interest. We provide open source implementation and evaluate the proposed approach on several large RDF datasets.

web intelligence | 2008

Learning Classifiers from Large Databases Using Statistical Queries

Neeraj Koul; Cornelia Caragea; Vasant G. Honavar; Vikas Bahirwani; Doina Caragea

We describe an approach to learning predictive models from large databases in settings where direct access to data is not available because of massive size of data, access restrictions, or bandwidth requirements. We outline some techniques for minimizing the number of statistical queries needed; and for efficiently coping with missing values in the data. We provide open source implementation of the decision tree and naive Bayes algorithms to demonstrate the feasibility of the proposed approach.

international conference on move to meaningful internet systems | 2010

Identifying and eliminating inconsistencies in mappings across hierarchical ontologies

Bhavesh Sanghvi; Neeraj Koul; Vasant G. Honavar

Many applications require the establishment of mappings between ontologies. Such mappings are established by domain experts or automated tools. Errors in mappings can introduce inconsistencies in the resulting combined ontology. We consider the problem of identifying the largest consistent subset of mappings in hierarchical ontologies. We consider mappings that assert that a concept in one ontology is a subconcept, superconcept, or equivalent concept of a concept in another ontology and show that even in this simple setting, the task of identifying the largest consistent subset is NP-hard. We explore several polynomial time algorithms for finding suboptimal solutions including a heuristic algorithm to this problem. We experimentally compare the algorithms using several synthetic as well as real-world ontologies and mappings.

bioinformatics and biomedicine | 2010

Scalable, updatable predictive models for sequence data

Neeraj Koul; Ngot Bui; Vasant G. Honavar

The emergence of data rich domains has led to an exponential growth in the size and number of data repositories, offering exciting opportunities to learn from the data using machine learning algorithms. In particular, sequence data is being made available at a rapid rate. In many applications, the learning algorithm may not have direct access to the entire dataset because of a variety of reasons such as massive data size or bandwidth limitation. In such settings, there is a need for techniques that can learn predictive models (e.g., classifiers) from large datasets without direct access to the data. We describe an approach to learn from massive sequence datasets using statistical queries. Specifically we show how Markov Models and Probabilistic Suffix Trees (PSTs) can be constructed from sequence databases that answer only a class of count queries. We analyze the query complexity (a measure of the number of queries needed) for constructing classifiers in such settings and outline some techniques to minimize the query complexity. We also show how some of the models can be updated in response to addition or deletion of subsets of sequences from the underlying sequence database.

web intelligence | 2010

Learning in Presence of Ontology Mapping Errors

Neeraj Koul; Vasant G. Honavar

The widespread use of ontologies to associate semantics with data has resulted in a growing interest in the problem of learning predictive models from data sources that use different ontologies to model the same underlying domain (world of interest). Learning from such \emph{semantically disparate} data sources involves the use of a mapping to resolve semantic disparity among the ontologies used. Often, in practice, the mapping used to resolve the disparity may contain errors and as such the learning algorithms used in such a setting must be robust in presence of mapping errors. We reduce the problem of learning from semantically disparate data sources in the presence of mapping errors to a variant of the problem of learning in the presence of nasty classification noise. This reduction allows us to transfer theoretical results and algorithms from the latter to the former.

international conference on tools with artificial intelligence | 2009

Design and Implementation of a Query Planner for Data Integration

Neeraj Koul; Vasant G. Honavar

Many applications require integrated access to multiple distributed, autonomous, and often semantically disparate data. Hence there is a need for bridging the semantic gap between the user and the data sources and for answering user queries based on the contents of multiple data sources. This paper describes a query planner that solves these two problems.

electro information technology | 2008

Complexes of on-line self assembly

Neeraj Koul; Jim Lathrop; Jack H. Lutz; Vasant G. Honavar

The Tile Assembly Model (TAM) is a mathematical model of nanoscale self-assembly. In this paper we this model to define an on-line self assembly models called Fair Online Assembly(FOAF) and its variation called the Bounded Fair Online Assembly (FOAB). We show that these two models are not equivalent to each other. We also introduce the concepts of Binary and Trinary Complexes for a Tile Assembly System (TAS) and show if the complexes have a special property (called Frontier Turn Off Point-FTP) then the corresponding Self Assemblies are FOAF. Finally we argue that FOAF, FOAB and the size of the T-frontier at the frontier turn off point may be used to measure the complexity of the TAS.

Archive | 2005