Maulik R. Kamdar
Stanford University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Maulik R. Kamdar.
Journal of Web Semantics | 2014
Muhammad Saleem; Maulik R. Kamdar; Aftab Iqbal; Shanmukha Sampath; Helena F. Deus; Axel-Cyrille Ngonga Ngomo
The amount of bio-medical data available on the Web grows exponentially with time. The resulting large volume of data makes manual exploration very tedious. Moreover, the velocity at which this data changes and the variety of formats in which bio-medical data is published makes it difficult to access them in an integrated form. Finally, the lack of an integrated vocabulary makes querying this data more difficult. In this paper, we advocate the use of Linked Data to integrate, query and visualize bio-medical data. The resulting Big Linked Data allows discovering knowledge distributed across manifold sources, making it viable for the serendipitous discovery of novel knowledge. We present the concept of Big Linked Data by showing how the constant stream of new bio-medical publications can be integrated with the Linked Cancer Genome Atlas dataset (TCGA) within a virtual integration scenario. We ensure the scalability of our approach through the novel TopFed federated query engine, which we evaluate by comparing the query execution time of our system with that of FedX on Linked TCGA. Then, we show how we can harness the value hidden in the underlying integrated data by making it easier to explore through a user-friendly interface. We evaluate the usability of the interface by using the standard system usability questionnaire as well as a customized questionnaire designed for the users of our system. Our overall result of 77 suggests that our interface is easy to use and can thus lead to novel insights.
international semantic web conference | 2014
Ali Hasnain; Maulik R. Kamdar; Panagiotis Hasapis; Dimitris Zeginis; Claude N. Warren; Helena F. Deus; Dimitrios Ntalaperas; Konstantinos A. Tarabanis; Muntazir Mehdi; Stefan Decker
The increase in the volume and heterogeneity of biomedical data sources has motivated researchers to embrace Linked Data (LD) technologies to solve the ensuing integration challenges and enhance information discovery. As an integral part of the EU GRANATUM project, a Linked Biomedical Dataspace (LBDS) was developed to semantically interlink data from multiple sources and augment the design of in silico experiments for cancer chemoprevention drug discovery. The different components of the LBDS facilitate both the bioinformaticians and the biomedical researchers to publish, link, query and visually explore the heterogeneous datasets. We have extensively evaluated the usability of the entire platform. In this paper, we showcase three different workflows depicting real-world scenarios on the use of LBDS by the domain users to intuitively retrieve meaningful information from the integrated sources. We report the important lessons that we learned through the challenges encountered and our accumulated experience during the collaborative processes which would make it easier for LD practitioners to create such dataspaces in other domains. We also provide a concise set of generic recommendations to develop LD platforms useful for drug discovery.
Journal of Biomedical Informatics | 2014
Maulik R. Kamdar; Dimitris Zeginis; Ali Hasnain; Stefan Decker; Helena F. Deus
Bioinformatics research relies heavily on the ability to discover and correlate data from various sources. The specialization of life sciences over the past decade, coupled with an increasing number of biomedical datasets available through standardized interfaces, has created opportunities towards new methods in biomedical discovery. Despite the popularity of semantic web technologies in tackling the integrative bioinformatics challenge, there are many obstacles towards its usage by non-technical research audiences. In particular, the ability to fully exploit integrated information needs using improved interactive methods intuitive to the biomedical experts. In this report we present ReVeaLD (a Real-time Visual Explorer and Aggregator of Linked Data), a user-centered visual analytics platform devised to increase intuitive interaction with data from distributed sources. ReVeaLD facilitates query formulation using a domain-specific language (DSL) identified by biomedical experts and mapped to a self-updated catalogue of elements from external sources. ReVeaLD was implemented in a cancer research setting; queries included retrieving data from in silico experiments, protein modeling and gene expression. ReVeaLD was developed using Scalable Vector Graphics and JavaScript and a demo with explanatory video is available at http://www.srvgal78.deri.ie:8080/explorer. A set of user-defined graphic rules controls the display of information through media-rich user interfaces. Evaluation of ReVeaLD was carried out as a game: biomedical researchers were asked to assemble a set of 5 challenge questions and time and interactions with the platform were recorded. Preliminary results indicate that complex queries could be formulated under less than two minutes by unskilled researchers. The results also indicate that supporting the identification of the elements of a DSL significantly increased intuitiveness of the platform and usability of semantic web technologies by domain users.
international semantic technology conference | 2014
Ali Hasnain; Syeda Sana e Zainab; Maulik R. Kamdar; Qaiser Mehmood; Claude N. Warren; Qurratal Ain Fatimah; Helena F. Deus; Muntazir Mehdi; Stefan Decker
Multiple datasets that add high value to biomedical research have been exposed on the web as a part of the Life Sciences Linked Open Data (LSLOD) Cloud. The ability to easily navigate through these datasets is crucial for personalized medicine and the improvement of drug discovery process. However, navigating these multiple datasets is not trivial as most of these are only available as isolated SPARQL endpoints with very little vocabulary reuse. The content that is indexed through these endpoints is scarce, making the indexed dataset opaque for users. In this paper, we propose an approach for the creation of an active Linked Life Sciences Data Roadmap, a set of congurable rules which can be used to discover links (roads) between biological entities (cities) in the LSLOD cloud. We have catalogued and linked concepts and properties from 137 public SPARQL endpoints. Our Roadmap is primarily used to dynamically assemble queries retrieving data from multiple SPARQL endpoints simultaneously. We also demonstrate its use in conjunction with other tools for selective SPARQL querying, semantic annotation of experimental datasets and the visualization of the LSLOD cloud. We have evaluated the performance of our approach in terms of the time taken and entity capture. Our approach, if generalized to encompass other domains, can be used for road-mapping the entire LOD cloud.
Sprachwissenschaft | 2017
Maulik R. Kamdar; Tania Tudorache; Mark A. Musen
Reusing ontologies and their terms is a principle and best practice that most ontology development methodologies strongly encourage. Reuse comes with the promise to support the semantic interoperability and to reduce engineering costs. In this paper, we present a descriptive study of the current extent of term reuse and overlap among biomedical ontologies. We use the corpus of biomedical ontologies stored in the BioPortal repository, and analyze different types of reuse and overlap constructs. While we find an approximate term overlap between 25-31%, the term reuse is only <9%, with most ontologies reusing fewer than 5% of their terms from a small set of popular ontologies. Clustering analysis shows that the terms reused by a common set of ontologies have >90% semantic similarity, hinting that ontology developers tend to reuse terms that are sibling or parent-child nodes. We validate this finding by analysing the logs generated from a Protégé plugin that enables developers to reuse terms from BioPortal. We find most reuse constructs were 2-level subtrees on the higher levels of the class hierarchy. We developed a Web application that visualizes reuse dependencies and overlap among ontologies, and that proposes similar terms from BioPortal for a term of interest. We also identified a set of error patterns that indicate that ontology developers did intend to reuse terms from other ontologies, but that they were using different and sometimes incorrect representations. Our results stipulate the need for semi-automated tools that augment term reuse in the ontology engineering process through personalized recommendations.
international world wide web conferences | 2017
Maulik R. Kamdar; Mark A. Musen
Integrated approaches for pharmacology are required for the mechanism-based predictions of adverse drug reactions that manifest due to concomitant intake of multiple drugs. These approaches require the integration and analysis of biomedical data and knowledge from multiple, heterogeneous sources with varying schemas, entity notations, and formats. To tackle these integrative challenges, the Semantic Web community has published and linked several datasets in the Life Sciences Linked Open Data (LSLOD) cloud using established W3C standards. We present the PhLeGrA platform for Linked Graph Analytics in Pharmacology in this paper. Through query federation, we integrate four sources from the LSLOD cloud and extract a drug-reaction network, composed of distinct entities. We represent this graph as a hidden conditional random field (HCRF), a discriminative latent variable model that is used for structured output predictions. We calculate the underlying probability distributions in the drug-reaction HCRF using the datasets from the U.S. Food and Drug Administrations Adverse Event Reporting System. We predict the occurrence of 146 adverse reactions due to multiple drug intake with an AUROC statistic greater than 0.75. The PhLeGrA platform can be extended to incorporate other sources published using Semantic Web technologies, as well as to discover other types of pharmacological associations.
pacific symposium on biocomputing | 2016
Maulik R. Kamdar; Michelle J. Wu
Neuropsychiatric disorders are the leading cause of disability worldwide and there is no gold standard currently available for the measurement of mental health. This issue is exacerbated by the fact that the information physicians use to diagnose these disorders is episodic and often subjective. Current methods to monitor mental health involve the use of subjective DSM-5 guidelines, and advances in EEG and video monitoring technologies have not been widely adopted due to invasiveness and inconvenience. Wearable technologies have surfaced as a ubiquitous and unobtrusive method for providing continuous, quantitative data about a patient. Here, we introduce PRISM-Passive, Real-time Information for Sensing Mental Health. This platform integrates motion, light and heart rate data from a smart watch application with user interactions and text entries from a web application. We have demonstrated a proof of concept by collecting preliminary data through a pilot study of 13 subjects. We have engineered appropriate features and applied both unsupervised and supervised learning to develop models that are predictive of user-reported ratings of their emotional state, demonstrating that the data has the potential to be useful for evaluating mental health. This platform could allow patients and clinicians to leverage continuous streams of passive data for early and accurate diagnosis as well as constant monitoring of patients suffering from mental disorders.
Database | 2015
Maulik R. Kamdar; Michel Dumontier
Ebola virus (EBOV), of the family Filoviridae viruses, is a NIAID category A, lethal human pathogen. It is responsible for causing Ebola virus disease (EVD) that is a severe hemorrhagic fever and has a cumulative death rate of 41% in the ongoing epidemic in West Africa. There is an ever-increasing need to consolidate and make available all the knowledge that we possess on EBOV, even if it is conflicting or incomplete. This would enable biomedical researchers to understand the molecular mechanisms underlying this disease and help develop tools for efficient diagnosis and effective treatment. In this article, we present our approach for the development of an Ebola virus-centered Knowledge Base (Ebola-KB) using Linked Data and Semantic Web Technologies. We retrieve and aggregate knowledge from several open data sources, web services and biomedical ontologies. This knowledge is transformed to RDF, linked to the Bio2RDF datasets and made available through a SPARQL 1.1 Endpoint. Ebola-KB can also be explored using an interactive Dashboard visualizing the different perspectives of this integrated knowledge. We showcase how different competency questions, asked by domain users researching the druggability of EBOV, can be formulated as SPARQL Queries or answered using the Ebola-KB Dashboard. Database URL: http://ebola.semanticscience.org.
Protein and Peptide Letters | 2012
Prabuddha Dey; Maulik R. Kamdar; Santi M. Mandal; Mrinal K. Maiti
An extracellular antifungal protein of 28 kDa (exAFP-C28) was identified from an endophytic fungus Colletotrichum sp. DM-06. After purification, the MIC value of exAFP-C28 against Candida albicans, a well-known human pathogenic fungus was found to be 32 μg/mL that unaffected the human red blood cells. The antifungal activity associated with exAFP-C28 was manifested by the increased membrane permeability of C. albicans cells followed by disruption. Proteomics and bioinformatics analyses revealed that several peptide fragments of exAFP-C28 have identity with the bacterial 50S ribosomal protein L10, and a stretch of 55 amino acids of two peptide fragments corresponding to the Nterminus of L10 protein is capable of forming amphipathic helix required for membrane penetration. Taken together, our results suggest that the exAFP-C28 protein from Colletotrichum sp. DM-06 is a promising therapeutic agent in controlling candidiasis disease in animals including humans.
Proceedings of the Pacific Symposium | 2018
Lichy Han; Maulik R. Kamdar
Glioblastoma Multiforme (GBM), a malignant brain tumor, is among the most lethal of all cancers. Temozolomide is the primary chemotherapy treatment for patients diagnosed with GBM. The methylation status of the promoter or the enhancer regions of the O6-methylguanine methyltransferase (MGMT) gene may impact the efficacy and sensitivity of temozolomide, and hence may affect overall patient survival. Microscopic genetic changes may manifest as macroscopic morphological changes in the brain tumors that can be detected using magnetic resonance imaging (MRI), which can serve as noninvasive biomarkers for determining methylation of MGMT regulatory regions. In this research, we use a compendium of brain MRI scans of GBM patients collected from The Cancer Imaging Archive (TCIA) combined with methylation data from The Cancer Genome Atlas (TCGA) to predict the methylation state of the MGMT regulatory regions in these patients. Our approach relies on a bi-directional convolutional recurrent neural network architecture (CRNN) that leverages the spatial aspects of these 3-dimensional MRI scans. Our CRNN obtains an accuracy of 67% on the validation data and 62% on the test data, with precision and recall both at 67%, suggesting the existence of MRI features that may complement existing markers for GBM patient stratification and prognosis. We have additionally presented our model via a novel neural network visualization platform, which we have developed to improve interpretability of deep learning MRI-based classification models.