Sumeet Agarwal
Indian Institute of Technology Delhi
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sumeet Agarwal.
PLOS Computational Biology | 2010
Sumeet Agarwal; Charlotte M. Deane; Mason A. Porter; Nick S. Jones
The idea of “date” and “party” hubs has been influential in the study of protein–protein interaction networks. Date hubs display low co-expression with their partners, whilst party hubs have high co-expression. It was proposed that party hubs are local coordinators whereas date hubs are global connectors. Here, we show that the reported importance of date hubs to network connectivity can in fact be attributed to a tiny subset of them. Crucially, these few, extremely central, hubs do not display particularly low expression correlation, undermining the idea of a link between this quantity and hub function. The date/party distinction was originally motivated by an approximately bimodal distribution of hub co-expression; we show that this feature is not always robust to methodological changes. Additionally, topological properties of hubs do not in general correlate with co-expression. However, we find significant correlations between interaction centrality and the functional similarity of the interacting proteins. We suggest that thinking in terms of a date/party dichotomy for hubs in protein interaction networks is not meaningful, and it might be more useful to conceive of roles for protein-protein interactions rather than for individual proteins.
international conference on data mining | 2007
Sumeet Agarwal; Shantanu Godbole; Diwakar Punjani; Shourya Roy
Noise is a stark reality in real life data. Especially in the domain of text analytics, it has a significant impact as data cleaning forms a very large part of the data processing cycle. Noisy unstructured text is common in informal settings such as on-line chat, SMS, email, newsgroups and blogs, automatically transcribed text from speech, and automatically recognized text from printed or handwritten material. Gigabytes of such data is being generated everyday on the Internet, in contact centers, and on mobile phones. Researchers have looked at various text mining issues such as pre-processing and cleaning noisy text, information extraction, rule learning, and classification for noisy text. This paper focuses on the issues faced by automatic text classifiers in analyzing noisy documents coming from various sources. The goal of this paper is to bring out and study the effect of different kinds of noise on automatic text classification. Does the nature of such text warrant moving beyond traditional text classification techniques? We present detailed experimental results with simulated noise on the Reuters- 21578 and 20-newsgroups benchmark datasets. We present interesting results on real-life noisy datasets from various CRM domains.
Neurocomputing | 2008
Sumeet Agarwal; V. Vijaya Saradhi; Harish Karnick
We apply kernel-based machine learning methods to online learning situations, and look at the related requirement of reducing the complexity of the learnt classifier. Online methods are particularly useful in situations which involve streaming data, such as medical or financial applications. We show that the concept of span of support vectors can be used to build a classifier that performs reasonably well while satisfying given space and time constraints, thus making it potentially suitable for such online situations. The span-based heuristic is observed to be effective under stringent memory limits (that is when the number of support vectors a machine can hold is very small).
PeerJ | 2013
Yoonjoo Choi; Sumeet Agarwal; Charlotte M. Deane
Loops are irregular structures which connect two secondary structure elements in proteins. They often play important roles in function, including enzyme reactions and ligand binding. Despite their importance, their structure remains difficult to predict. Most protein loop structure prediction methods sample local loop segments and score them. In particular protein loop classifications and database search methods depend heavily on local properties of loops. Here we examine the distance between a loop’s end points (span). We find that the distribution of loop span appears to be independent of the number of residues in the loop, in other words the separation between the anchors of a loop does not increase with an increase in the number of loop residues. Loop span is also unaffected by the secondary structures at the end points, unless the two anchors are part of an anti-parallel beta sheet. As loop span appears to be independent of global properties of the protein we suggest that its distribution can be described by a random fluctuation model based on the Maxwell–Boltzmann distribution. It is believed that the primary difficulty in protein loop structure prediction comes from the number of residues in the loop. Following the idea that loop span is an independent local property, we investigate its effect on protein loop structure prediction and show how normalised span (loop stretch) is related to the structural complexity of loops. Highly contracted loops are more difficult to predict than stretched loops.
computational intelligence and data mining | 2014
Anil Kumar; Nitesh Kumar; Muzammil Hussain; Santanu Chaudhury; Sumeet Agarwal
Cross-domain recommendation systems exploit tags, textual descriptions or ratings available for items in one domain to recommend items in multiple domains. Handling unstructured/ unannotated item information is, however, a challenge. Topic modeling offer a popular method for deducing structure in such data corpora. In this paper, we introduce the concept of a common latent semantic space, spanning multiple domains, using topic modeling of semantic clustered vocabularies of distinct domains. The intuition here is to use explicitly-determined semantic relationships between non-identical, but possibly semantically equivalent, words in multiple domain vocabularies, in order to capture relationships across information obtained in distinct domains. The popular WordNet based ontology is used to measure semantic relatedness between textual words. The experimental results shows that there is a marked improvement in the precision of predicting user preferences for items in one domain when given the preferences in another domain.
inductive logic programming | 2015
Ashwin Srinivasan; Michael Bain; Deepika Vatsa; Sumeet Agarwal
The identification of transition models of biological systems (Petri net models, for example) in noisy environments has not been examined to any significant extent, although they have been used to model the ideal behaviour of metabolic, signalling and genetic networks. Progress has been made in identifying such models from sequences of qualitative states of the system; and, more recently, with additional logical constraints as background knowledge. Both forms of model identification assume the data are correct, which is often unrealistic since biological systems are inherently stochastic. In this paper, we model the transition noise that can affect model identification as a Markov process where the corresponding transition functions are assumed to be known. We investigate, in the presence of this transition noise, the identification of transitions in a target model. The experiments are re-constructions of known networks from simulated data with varying amounts of transition-noise added. In each case, the target model traces a specific trajectory through the state-space. Model structures that explain the noisy state-sequences are obtained based on recent work which formulates the identification of transition models as logical consequence-finding. With noisy data, we need to extend this formulation by allowing the abduction of new transitions. The resulting structures may be both incorrect and incomplete with respect to the target model. We quantify the ability to identify the transitions in the target model, using probability estimates computed from transition-sequences using PRISM. Empirical results suggest that we are able to identify correctly the transitions in the target model with transition noise levels ranging from low to high values.
acm multimedia | 2017
Abhimanyu Dubey; Sumeet Agarwal
The study of virality and information diffusion is a topic gaining traction rapidly in the computational social sciences. Computer vision and social network analysis research have also focused on understanding the impact of content and information diffusion in making content viral, with prior approaches not performing significantly well as other traditional classification tasks. In this paper, we present a novel pairwise reformulation of the virality prediction problem as an attribute prediction task and develop a novel algorithm to model image virality on online media using a pairwise neural network. Our model provides significant insights into the features that are responsible for promoting virality and surpasses the existing state-of-the-art by a 12% average improvement in prediction. We also investigate the effect of external category supervision on relative attribute prediction and observe an increase in prediction accuracy for the same across several attribute learning datasets.
Artificial Intelligence Review | 2015
Tanya Raghuvanshi; Shraddha Chaudhary; Varnica Jain; Sumeet Agarwal; Santanu Chaudhury
This paper proposes an Autonomous Machine Vision system which grasps a textureless object from a clutter in a single plane, rearranges it for proper placement and then places it using vision. It contributes to a unique vision-based pose estimation algorithm, collision free path planning and dynamic Change-Over algorithm for final placement.
international congress on big data | 2014
Anil Kumar; Vikas Kapur; Apangshu Saha; Rajeev Kumar Gupta; Arun Singh; Santanu Chaudhuryy; Sumeet Agarwal
Latent rating pattern sharing based approaches for cross-domain recommendations can alleviate the data sparsity problem by pulling the knowledge available from other domains and are faster in prediction. However, since the prediction quality depends on number of chosen user and item classes for given data-set, the model training time becomes prohibitively large even for medium size data-sets. In this paper, we propose a MapReduce based distributed implementation of the cross domain recommendation algorithm. Our implementation has the capability to run on modern distributed computing frameworks, such as Hadoop and Twister, that utilize commodity machines. The experimental results show that the training time increases only linearly with user and item classes when compared to the exponential increase in case of its sequential counterpart.
European Journal of Radiology | 2018
Anirban Sengupta; Sumeet Agarwal; P. K. Gupta; Sunita Ahlawat; Rana Patir; Rakesh Gupta; Anup Singh
PURPOSE High grade gliomas (HGGs) are infiltrative in nature. Differentiation between vasogenic edema and non-contrast enhancing tumor is difficult as both appear hyperintense in T2-W/FLAIR images. Most studies involving differentiation between vasogenic edema and non-enhancing tumor consider radiologist-based tumor delineation as the ground truth. However, analysis by a radiologist can be subjective and there remain both inter- and intra-rater differences. The objective of the current study is to develop a methodology for differentiation between non-enhancing tumor and vasogenic edema in HGG patients based on T1 perfusion MRI parameters, using a ground truth which is independent of a radiologists manual delineation of the tumor. MATERIAL AND METHODS This study included 9 HGG patients with pre- and post-surgery MRI data and 9 metastasis patients with pre-surgery MRI data. MRI data included conventional T1-W, T2-W, and FLAIR images and DCE-MRI dynamic images. In this study, the authors hypothesize that surgeried non-enhancing FLAIR hyperintense tissue, which was obtained using pre- and post-surgery MRI images of glioma patients, should be largely comprised of non-enhancing tumor. Hence this could be used as an alternative ground truth for the non-enhancing tumor region. Histological examination of the resected tissue was done for validation. Vasogenic edema was obtained from the non-enhancing FLAIR hyperintense region of metastasis patients, as they have a clear boundary between enhancing tumor and edema. DCE-MRI data analysis was performed to obtain T1 perfusion MRI parameters. Support Vector Machine (SVM) classification was performed using T1 perfusion MRI parameters to differentiate between non-enhancing tumor and vasogenic edema. Receiver-operating-characteristic (ROC) analysis was done on the results of the SVM classifier. For improved classification accuracy, the SVM output was post-processed via neighborhood smoothing. RESULTS Histology results showed that resected tissue consists largely of tumorous tissue with 7.21 ± 4.05% edema and a small amount of healthy tissue. SVM-based classification provided a misclassification error of 8.4% in differentiation between non-enhancing tumor and vasogenic edema, which was further reduced to 2.4% using neighborhood smoothing. CONCLUSION The current study proposes a semiautomatic method for segmentation between non-enhancing tumor and vasogenic edema in HGG patients, based on an SVM classifier trained on an alternative ground truth to a radiologists manual delineation of a tumor. The proposed methodology may prove to be a useful tool for pre- and post-operative evaluation of glioma patients.