Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Parikshit Sondhi is active.

Publication


Featured researches published by Parikshit Sondhi.


european conference on information retrieval | 2012

Reliability prediction of webpages in the medical domain

Parikshit Sondhi; V. G. Vinod Vydiswaran; ChengXiang Zhai

In this paper, we study how to automatically predict reliability of web pages in the medical domain. Assessing reliability of online medical information is especially critical as it may potentially influence vulnerable patients seeking help online. Unfortunately, there are no automated systems currently available that can classify a medical webpage as being reliable, while manual assessment cannot scale up to process the large number of medical pages on the Web. We propose a supervised learning approach to automatically predict reliability of medical webpages. We developed a gold standard dataset using the standard reliability criteria defined by the Health on Net Foundation and systematically experimented with different link and content based feature sets. Our experiments show promising results with prediction accuracies of over 80%. We also show that our proposed prediction method is useful in applications such as reliability-based re-ranking and automatic website accreditation.


knowledge discovery and data mining | 2012

SympGraph: a framework for mining clinical notes through symptom relation graphs

Parikshit Sondhi; Jimeng Sun; Hanghang Tong; ChengXiang Zhai

As an integral part of Electronic Health Records (EHRs), clinical notes pose special challenges for analyzing EHRs due to their unstructured nature. In this paper, we present a general mining framework SympGraph for modeling and analyzing symptom relationships in clinical notes. A SympGraph has symptoms as nodes and co-occurrence relations between symptoms as edges, and can be constructed automatically through extracting symptoms over sequences of clinical notes for a large number of patients. We present an important clinical application of SympGraph: symptom expansion, which can expand a given set of symptoms to other related symptoms by analyzing the underlying SympGraph structure. We further propose a matrix update algorithm which provides a significant computational saving for dynamic updates to the graph. Comprehensive evaluation on 1 million longitudinal clinical notes over 13K patients shows that static symptom expansion can successfully expand a set of known symptoms to a disease with high agreement rate with physician input (average precision 0.46), a 31% improvement over baseline co-occurrence based methods. The experimental results also show that the expanded symptoms can serve as useful features for improving AUC measure for disease diagnosis prediction, thus confirming the potential clinical value of our work.


Eurasip Journal on Bioinformatics and Systems Biology | 2007

Question processing and clustering in INDOC: a biomedical question answering system

Parikshit Sondhi; Purushottam Raj; Vinod Kumar; Ankush Mittal

The exponential growth in the volume of publications in the biomedical domain has made it impossible for an individual to keep pace with the advances. Even though evidence-based medicine has gained wide acceptance, the physicians are unable to access the relevant information in the required time, leaving most of the questions unanswered. This accentuates the need for fast and accurate biomedical question answering systems. In this paper we introduce INDOC—a biomedical question answering system based on novel ideas of indexing and extracting the answer to the questions posed. INDOC displays the results in clusters to help the user arrive the most relevant set of documents quickly. Evaluation was done against the standard OHSUMED test collection. Our system achieves high accuracy and minimizes user effort.


Journal of the American Medical Informatics Association | 2012

Leveraging medical thesauri and physician feedback for improving medical literature retrieval for case queries

Parikshit Sondhi; Jimeng Sun; ChengXiang Zhai; Robert Sorrentino; Martin S. Kohn

OBJECTIVE This paper presents a study of methods for medical literature retrieval for case queries, in which the goal is to retrieve literature articles similar to a given patient case. In particular, it focuses on analyzing the performance of state-of-the-art general retrieval methods and improving them by the use of medical thesauri and physician feedback. MATERIALS AND METHODS The Kullback-Leibler divergence retrieval model with Dirichlet smoothing is used as the state-of-the-art general retrieval method. Pseudorelevance feedback and term weighing methods are proposed by leveraging MeSH and UMLS thesauri. Evaluation is performed on a test collection recently created for the ImageCLEF medical case retrieval challenge. RESULTS Experimental results show that a well-tuned state-of-the-art general retrieval model achieves a mean average precision of 0.2754, but the performance can be improved by over 40% to 0.3980, through the proposed methods. DISCUSSION The results over the ImageCLEF test collection, which is currently the best collection available for the task, are encouraging. There are, however, limitations due to small evaluation set size. The analysis shows that further refinement of the methods is necessary before they can be really useful in a clinical setting. CONCLUSION Medical case-based literature retrieval is a critical search application that presents a number of unique challenges. This analysis shows that the state-of-the-art general retrieval models are reasonably good for the task, but the performance can be significantly improved by developing new task-specific retrieval models that incorporate medical thesauri and physician feedback.


international conference on bioinformatics | 2014

Resolving healthcare forum posts via similar thread retrieval

Jason H. D. Cho; Parikshit Sondhi; ChengXiang Zhai; Bruce R. Schatz

Web communities such as healthcare web forums serve as popular platforms for users to get their complex medical queries resolved. A typical forum thread contains a query in its first post, and a discussion around it in subsequent posts. However many users do not receive satisfactory responses from other members in the community, leaving them dissatisfied. We propose to help these users by exploiting an existing collection of discussion threads. Often many users suffer from the same medical condition and start multiple discussion threads on very similar queries. In this paper we develop and evaluate a plethora of specialized search methods that treat an entire unresolved forum post as a query, and retrieve forum threads discussing similar problems to help resolve it. The task is more challenging than a traditional document retrieval problem, since forum posts can contain a lot of irrelevant background information. The discussion threads to be retrieved are also quite different from traditional unstructured text documents. We evaluate our results on a dataset comprising over 350K discussion threads and show that our proposed methods outperform state of the art retrieval methods for the task. In particular, method based on non-uniform weighting of thread posts and semantic analysis of the query text perform quite well.


conference on information and knowledge management | 2014

Mining Semi-Structured Online Knowledge Bases to Answer Natural Language Questions on Community QA Websites

Parikshit Sondhi; ChengXiang Zhai

Over the past few years, community QA websites (e.g. Yahoo! Answers) have become a useful platform for users to post questions and obtain answers. However, not all questions posted there receive informative answers or are answered in a timely manner. In this paper, we show that the answers to some of these questions are available in online domain-specific knowledge bases and propose an approach to automatically discover those answers. In the proposed approach, we would first mine appropriate SQL query patterns by leveraging an existing collection of QA pairs, and then use the learned query patterns to answer previously unseen questions by returning relevant entities from the knowledge base. Evaluation on a collection of health domain questions from Yahoo! Answers shows that the proposed method is effective in discovering potential answers to user questions from an online medical knowledge base.


international conference on the theory of information retrieval | 2013

Exploiting Forum Thread Structures to Improve Thread Clustering

Kumaresh Pattabiraman; Parikshit Sondhi; ChengXiang Zhai

Automated clustering of threads within and across web forums will greatly benefit both users and forum administrators in efficiently seeking, managing, and integrating the huge volume of content being generated. While clustering has been studied for other types of data, little work has been done on clustering forum threads; the informal nature and special structure of forum data make it interesting to study how to effectively cluster forum threads. In this paper, we apply three state of the art clustering methods (i.e., hierarchical agglomerative clustering, k-Means, and probabilistic latent semantic analysis) to cluster forum threads and study how to leverage the structure of threads to improve clustering accuracy. We propose three different methods for assigning weights to the posts in a forum thread to achieve more accurate representation of a thread. We evaluate all the methods on data collected from three different Linux forums for both within-forum and across-forum clustering. Our results show that the state of the art methods perform reasonably well for this task, but the performance can be further improved by exploiting thread structures. In particular, a parabolic weighting method that assigns higher weights for both beginning posts and end posts of a thread is shown to consistently outperform a standard clustering method.


siam international conference on data mining | 2014

A constrained hidden Markov model approach for non-explicit citation context extraction

Parikshit Sondhi; ChengXiang Zhai

In this paper we present a constrained hidden markov model based approach for extracting non-explicit citing sentences in research articles. Our method involves first independently training a separate HMM for each citation in the article being processed and then performing a constrained joint inference to label non-explicit citing sentences. Results on a standard test collection show that our method significantly outperforms the baselines and is comparable to the state of the art approaches.


International Journal of Mobile Communications | 2014

A text based drug query system for mobile phones

Akhil Langer; Rohit Banga; Ankush Mittal; L.V. Subramaniam; Parikshit Sondhi

Dissemination of medical information using mobile phones is still in a nascent stage because of their limited features - lack of penetration of mobile internet, small screen size etc. We present the design of a drug QA system that could be used for providing information about medicines over short message service SMS. We begin with a survey of the drug information domain and classify the drug related queries into a set of predefined classes. Our system uses several natural language processing tools coupled with machine learning classification techniques to process drug information related queries. We focus on developing a natural language interface allowing the user to be flexible in phrasing their queries and attain an accuracy of 81% in classifying the drug related questions. We conclude that it is feasible and cheap to deploy such a system to encourage the practice of evidence based medicine.


international conference on distributed computing and internet technology | 2012

Parallelization of pagerank on multicore processors

Tarun Kumar; Parikshit Sondhi; Ankush Mittal

PageRank is a prominent metric used by search engines for ranking of search results. Page rank of a particular web page is a function of page ranks of all the web pages pointing to this page. The algorithm works on a large number of web pages and is thus computational intensive. The need of hardware is currently served by connecting thousands of computers in cluster. But faster and less complex alternatives to this system can be found in multi-core processors. In this paper, we identify major issues involved in porting PageRank algorithm on Cell BE Processor and CUDA, and their possible solutions. The work is evaluated on three input graphs of different sizes ranging from 0.35 million nodes to 1.3 million. Our results show that PageRank algorithm runs 2.8 times fast on CUDA compared to Xeon dual core 3.0 GHz.

Collaboration


Dive into the Parikshit Sondhi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Purushottam Raj

Indian Institute of Technology Roorkee

View shared research outputs
Top Co-Authors

Avatar

Vinod Kumar

Indian Institute of Technology Roorkee

View shared research outputs
Researchain Logo
Decentralizing Knowledge