Is this you? Create Your Porfile

Ashwin Srinivasan

Birla Institute of Technology and Science

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ashwin Srinivasan is active.

Explore More

Publication

Featured researches published by Ashwin Srinivasan.

Archive | 1997

Carcinogenesis Predictions Using Inductive Logic Programming

Ashwin Srinivasan; Ross D. King; Stephen Muggleton; Michael J. E. Sternberg

Obtaining accurate structural alerts for the causes of chemical cancers is a problem of great scientific and humanitarian value. This chapter builds on our earlier research that demonstrated the use of Inductive Logic Programming (ILP) for predictions for the related problem of mutagenic activity amongst nitroaromatic molecules. Here we are concerned with predicting carcinogenic activity in rodent bioassays using data from the U.S. National Toxicology Program conducted by the National Institute of Environmental Health Sciences. The 330 chemicals used here are significantly more diverse than the mutagenesis study, and form the basis for obtaining Structure-Activity Relationships (SARs) relating molecular structure to cancerous activity in rodents. We describe the use of the ILP system Progol to obtain SARs from this data. The rules obtained from Progol are comparable in accuracy to those from expert chemists, and more accurate than most state-of-the-art toxicity prediction methods. The rules can also be interpreted to give clues about the biological and chemical mechanisms of cancerogenesis, and make use of those learned by Progol for mutagenesis. Finally, we present details of, and predictions for, an ongoing international blind trial aimed specifically at comparing prediction methods.

Machine Learning | 2017

An empirical study of on-line models for relational data streams

Ashwin Srinivasan; Michael Bain

To date, Inductive Logic Programming (ILP) systems have largely assumed that all data needed for learning have been provided at the onset of model construction. Increasingly, for application areas like telecommunications, astronomy, text processing, financial markets and biology, machine-generated data are being generated continuously and on a vast scale. We see at least four kinds of problems that this presents for ILP: (1) it may not be possible to store all of the data, even in secondary memory; (2) even if it were possible to store the data, it may be impractical to construct an acceptable model using partitioning techniques that repeatedly perform expensive coverage or subsumption-tests on the data; (3) models constructed at some point may become less effective, or even invalid, as more data become available (exemplified by the “drift” problem when identifying concepts); and (4) the representation of the data instances may need to change as more data become available (a kind of “language drift” problem). In this paper, we investigate the adoption of a stream-based on-line learning approach to relational data. Specifically, we examine the representation of relational data in both an infinite-attribute setting, and in the usual fixed-attribute setting, and develop implementations that use ILP engines in combination with on-line model-constructors. The behaviour of each program is investigated using a set of controlled experiments, and performance in practical settings is demonstrated by constructing complete theories for some of the largest biochemical datasets examined by ILP systems to date, including one with a million examples; to the best of our knowledge, the first time this has been empirically demonstrated with ILP on a real-world data set.

inductive logic programming | 2015

Identification of Transition Models of Biological Systems in the Presence of Transition Noise

Ashwin Srinivasan; Michael Bain; Deepika Vatsa; Sumeet Agarwal

The identification of transition models of biological systems (Petri net models, for example) in noisy environments has not been examined to any significant extent, although they have been used to model the ideal behaviour of metabolic, signalling and genetic networks. Progress has been made in identifying such models from sequences of qualitative states of the system; and, more recently, with additional logical constraints as background knowledge. Both forms of model identification assume the data are correct, which is often unrealistic since biological systems are inherently stochastic. In this paper, we model the transition noise that can affect model identification as a Markov process where the corresponding transition functions are assumed to be known. We investigate, in the presence of this transition noise, the identification of transitions in a target model. The experiments are re-constructions of known networks from simulated data with varying amounts of transition-noise added. In each case, the target model traces a specific trajectory through the state-space. Model structures that explain the noisy state-sequences are obtained based on recent work which formulates the identification of transition models as logical consequence-finding. With noisy data, we need to extend this formulation by allowing the abduction of new transitions. The resulting structures may be both incorrect and incomplete with respect to the target model. We quantify the ability to identify the transitions in the target model, using probability estimates computed from transition-sequences using PRISM. Empirical results suggest that we are able to identify correctly the transitions in the target model with transition noise levels ranging from low to high values.

inductive logic programming | 2012

Topic Models with Relational Features for Drug Design

Tanveer A. Faruquie; Ashwin Srinivasan; Ross D. King

To date, ILP models in drug design have largely focussed on models in first-order logic that relate two- or three-dimensional molecular structure of a potential drug (a ligand) to its activity (for example, inhibition of some protein). In modelling terms: (a) the models have largely been logic-based (although there have been some attempts at probabilistic models); (b) the models have been mostly of a discriminatory nature (they have been mainly used for classification tasks); and (c) data for concepts to be learned are usually provided explicitly: “hidden” or latent concept learning is rare. Each of these aspects imposes certain limitations on the use of such models for drug design. Here, we propose the use of “topic models”—correctly, hierarchical Bayesian models—as a general and powerful modelling technique for drug design. Specifically, we use the feature-construction cabilities of a general-purpose ILP system to incorporate complex relational information into topic models for drug-like molecules. Our main interest in this paper is to describe computational tools to assist the discovery of drugs for malaria. To this end, we describe the construction of topic models using the GlaxoSmithKline Tres Cantos Antimalarial TCAMS dataset. This consists of about 13,000 inhibitors of the 3D7 strain of P. falciparum in human erythrocytes, obtained by screening of approximately 2 million compounds. We investigate the discrimination of molecules into groups (for example, “more active” and “less active”). For this task, we present evidence that suggests that when it is important to maximise the detection of molecules with high activity (“hits”), topic-based classifiers may be better than those that operate directly on the feature-space representation of the molecules. Besides the applicability for modelling anti-malarials, an obvious utility of topic-modelling as a technique of reducing the dimensionality of ILP-constructed feature spaces is also apparent.

conference on information and knowledge management | 2017

Hybrid BiLSTM-Siamese network for FAQ Assistance

Prerna Khurana; Puneet Agarwal; Gautam Shroff; Lovekesh Vig; Ashwin Srinivasan

We describe an automated assistant for answering frequently asked questions; our system has been deployed, and is currently answering HR-related queries in two different areas (leave management and health insurance) to a large number of users. The needs of a large global corporate lead us to model a frequently asked question (FAQ) to be an equivalence class of actually asked questions, for which there is a common answer (certified as being consistent with the organizations policy). When a new question is posed to our system, it finds the class of question, and responds with the answer for the class. At this point, the system is either correct (gives correct answer); or incorrect (gives wrong answer); or incomplete (says I dont know). We employ a hybrid deep-learning architecture in which a BiLSTM-based classifier is combined with second BiLSTM-based Siamese network in an iterative manner: Questions for which the classifier makes an error during training are used to generate a set of misclassified question-question pairs. These, along with correct pairs, are used to train the Siamese network to drive apart the (hidden) representations of the misclassified pairs. We present experimental results from our deployment showing that our iteratively trained hybrid network: (a) results in better performance than using just a classifier network, or just a Siamese network; (b) performs better than state-of-the art sentence classifiers in the two areas in which it has been deployed, in terms of both accuracy as well as precision-recall tradeoff; and (c) also performs well on a benchmark public dataset. We also observe that using question-question pairs in our hybrid network, results in marginally better performance than using question-to-answer pairs. Finally, estimates of precision and recall from the deployment of our automated assistant suggest that we can expect the burden on our HR department to drop from answering about 6000 queries a day to about 1000.

Machine Learning | 2016

ILP-assisted de novo drug design

Rama Kaalia; Ashwin Srinivasan; Amit Kumar; Indira Ghosh

De novo design of drugs uses the three-dimensional structure of a target protein (often called the receptor) to design molecules (or ligands) that could bind to the receptor and hence inhibit its functioning. Thus, unlike a ligand-based approach, this form of drug design does not require prior knowledge of inhibitors. In this paper, the three-dimensional structure of a receptor is used indirectly, in the form of molecular interaction fields of the receptor and small molecules (or probes). In addition, we also use domain-specific constraints encoding basic geometric and pharmacological requirements imposed by the target. Interaction energies of one or more targets with a set of probes are used to identify three-dimensional constraints that occur in many—preferably all—targets. In a graph-theoretic sense, the constraints are (small, fixed-size) cliques in graphs with labelled vertices representing probe-specific points of high interaction energy, and edges between a pair of vertices are labelled by the three-dimensional distance between the corresponding points of interaction. Our interest is in the discovery of frequent cliques that satisfy domain-specific constraints. In the paper, the discovery of such patterns is done using an Inductive Logic Programming (ILP) engine. The case for the use of ILP stems primarily from the explicit ways of incorporating domain-constraints, but any other technique capable of discovering frequent cliques from data can be used with some additional effort. The frequent cliques discovered are used to hypothesize pharmacophore-like structures on potential ligands. We test the utility of this approach by conducting a case study on the discovery of anti-malarials. Specifically, we test the approach on proteins belonging to the class of aspartic proteases. We are particularly interested in plasmepsin II, which is an enzyme in the haemoglobin degradation pathway of Plasmodium falciparum. We assess the pharmacophore-like constraints using: (a) a database of known inhibitors and non-inhibitors of aspartic proteases; and (b) a database of decoys that are physico-chemically similar to the aspartic proteases. Our results suggest that the approach could be used to obtain pharmacophores with good precision and recall for aspartic proteases.

Molecular Informatics | 2015

An Ab Initio Method for Designing Multi‐Target Specific Pharmacophores using Complementary Interaction Field of Aspartic Proteases

Rama Kaalia; Amit Kumar; Ashwin Srinivasan; Indira Ghosh

For past few decades, key objectives of rational drug discovery have been the designing of specific and selective ligands for target proteins. Infectious diseases like malaria are continuously becoming resistant to traditional medicines, which inculcates need for new approaches to design inhibitors for antimalarial targets. A novel method for ab initio designing of multi target specific pharmacophores using the interaction field maps of active sites of multiple proteins has been developed to design ‘specificity’ pharmacophores for aspartic proteases. The molecular interaction field grid maps of active sites of aspartic proteases (plasmepsin II & IV from Plasmodium falciparum, plasmepsin from Plasmodium vivax, pepsin & cathepsin D from human) are calculated and common pharmacophoric features for favourable binding spots in active sites are extracted in the form of cliques of graphs using inductive logic programming (ILP). The two pharmacophore ensembles are constructed from largest common cliques by imposing size of receptor active site (L) and domain‐specific receptor‐ligand information (S). The overlap of chemical space between two ensembles and the results of virtual screening of inhibitor database with known activities show that this method can design efficient pharmacophores with no prior ligand information.

inductive logic programming | 2017

An Investigation into the Role of Domain-Knowledge on the Use of Embeddings

Lovekesh Vig; Ashwin Srinivasan; Michael Bain; Ankit Verma

Computing similarity in high-dimensional vector spaces is a long-standing problem that has recently seen significant progress with the invention of the word2vec algorithm. Usually, it has been found that using an embedded representation results in much better performance for the task being addressed. It is not known whether embeddings can similarly improve performance with data of the kind considered by Inductive Logic Programming (ILP), in which data apparently dissimilar on the surface, can be similar to each other given domain (background) knowledge. In this paper, using several ILP classification benchmarks, we investigate if embedded representations are similarly helpful for problems where there is sufficient amounts of background knowledge. We use tasks for which we have domain expertise about the relevance of background knowledge available and consider two subsets of background predicates (“sufficient” and “insufficient”). For each subset, we obtain a baseline representation consisting of Boolean-valued relational features. Next, a vector embedding specifically designed for classification is obtained. Finally, we examine the predictive performance of widely-used classification methods with and without the embedded representation. With sufficient background knowledge we find no statistical evidence for an improved performance with an embedded representation. With insufficient background knowledge, our results provide empirical evidence that for the specific case of using deep networks, an embedded representation could be useful.

Archive | 2016

Searching for Logical Patterns in Multi-sensor Data from the Industrial Internet

Mohit Yadav; Ehtesham Hassan; Gautam Shroff; Puneet Agarwal; Ashwin Srinivasan

Engineers analysing large volumes of multi-sensor data from vehicles, engines etc. often seek to search for events such as “hard-stops”, “lane passing” or “engine overload”. Apart from such visual analysis for engineering purposes, manufactures also need to count occurrences of such events via on-board monitoring sensors that ideally rely on classifiers; searching for patterns in available data is also useful for preparing training sets in this context. In this paper, we propose a method for searching for multi-sensor patterns in large volumes of sensor data using qualitative symbols (QSIM (Say, Functions representable in pure QSIM, 251–255, 1996, [1])) such as “steady”, “increasing”, “decreasing”. Patterns can include symbol-sequences for multiple sensors, as well as approximate duration, level or slope values. Logical symbols are extracted from multi-sensor time-series and registered in a trie-based index structure. We demonstrate the effectiveness of our retrieval and ranking technique on real-life vehicular sensor data in the visual analytics as well as classifier training and detection scenarios.

inductive logic programming | 2018

Large-Scale Assessment of Deep Relational Machines

Tirtharaj Dash; Ashwin Srinivasan; Lovekesh Vig; Oghenejokpeme I. Orhobor; Ross D. King

Deep Relational Machines (or DRMs) present a simple way for incorporating complex domain knowledge into deep networks. In a DRM this knowledge is introduced through relational features: in the original formulation of [1], the features are selected by an ILP engine using domain knowledge encoded as logic programs. More recently, in [2], DRMs appear to achieve good performance without the need of feature-selection by an ILP engine (the features are simply drawn randomly from a space of relevant features). The reports so far on DRMs though have been deficient on three counts: (a) They have been tested on very small amounts of data (7 datasets, not all independent, altogether with few 1000s of instances); (b) The background knowledge involved has been modest, involving few 10s of predicates; and (c) Performance assessment has been only on classification tasks. In this paper we rectify each of these shortcomings by testing on datasets from the biochemical domain involving 100s of 1000s of instances; industrial-strength background predicates involving multiple hierarchies of complex definitions; and on classification and regression tasks. Our results provide substantially reliable evidence of the predictive capabilities of DRMs; along with a significant improvement in predictive performance with the incorporation of domain knowledge. We propose the new datasets and results as updated benchmarks for comparative studies in neural-symbolic modelling.

Explore More