Sourangshu Bhattacharya
Indian Institute of Technology Kharagpur
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sourangshu Bhattacharya.
international conference on machine learning | 2007
Sourangshu Bhattacharya; Chiranjib Bhattacharyya; Nagasuma Chandra
Structural alignments are the most widely used tools for comparing proteins with low sequence similarity. The main contribution of this paper is to derive various kernels on proteins from structural alignments, which do not use sequence information. Central to the kernels is a novel alignment algorithm which matches substructures of fixed size using spectral graph matching techniques. We derive positive semi-definite kernels which capture the notion of similarity between substructures. Using these as base more sophisticated kernels on protein structures are proposed. To empirically evaluate the kernels we used a 40% sequence non-redundant structures from 15 different SCOP superfamilies. The kernels when used with SVMs show competitive performance with CE, a state of the art structure comparison program.
BMC Bioinformatics | 2006
Sourangshu Bhattacharya; Chiranjib Bhattacharyya; Nagasuma Chandra
BackgroundIn recent times, there has been an exponential rise in the number of protein structures in databases e.g. PDB. So, design of fast algorithms capable of querying such databases is becoming an increasingly important research issue. This paper reports an algorithm, motivated from spectral graph matching techniques, for retrieving protein structures similar to a query structure from a large protein structure database. Each protein structure is specified by the 3D coordinates of residues of the protein. The algorithm is based on a novel characterization of the residues, called projections, leading to a similarity measure between the residues of the two proteins. This measure is exploited to efficiently compute the optimal equivalences.ResultsExperimental results show that, the current algorithm outperforms the state of the art on benchmark datasets in terms of speed without losing accuracy. Search results on SCOP 95% nonredundant database, for fold similarity with 5 proteins from different SCOP classes show that the current method performs competitively with the standard algorithm CE. The algorithm is also capable of detecting non-topological similarities between two proteins which is not possible with most of the state of the art tools like Dali.
BMC Bioinformatics | 2007
Sourangshu Bhattacharya; Chiranjib Bhattacharyya; Nagasuma Chandra
BackgroundDesign of protein structure comparison algorithm is an important research issue, having far reaching implications. In this article, we describe a protein structure comparison scheme, which is capable of detecting correct alignments even in difficult cases, e.g. non-topological similarities. The proposed method computes protein structure alignments by comparing, small substructures, called neighborhoods. Two different types of neighborhoods, sequence and structure, are defined, and two algorithms arising out of the scheme are detailed. A new method for computing equivalences having non-topological similarities from pairwise similarity score is described. A novel and fast technique for comparing sequence neighborhoods is also developed.ResultsThe experimental results show that the current programs show better performance on Fischer and Novotnys benchmark datasets, than state of the art programs, e.g. DALI, CE and SSM. Our programs were also found to calculate correct alignments for proteins with huge amount of indels and internal repeats. Finally, the sequence neighborhood based program was used in extensive fold and non-topological similarity detection experiments. The accuracy of the fold detection experiments with the new measure of similarity was found to be similar or better than that of the standard algorithm CE.ConclusionA new scheme, resulting in two algorithms, have been developed, implemented and tested. The programs developed are accessible at http://mllab.csa.iisc.ernet.in/mp2/runprog.html.
conference on information and knowledge management | 2012
Sriram Srinivasan; Sourangshu Bhattacharya; Rudrasis Chakraborty
Segmentation of a string of English language characters into a sequence of words has many applications. Here, we study two applications in the internet domain. First application is the web domain segmentation which is crucial for monetization of broken URLs. Secondly, we propose and study a novel application of twitter hashtag segmentation for increasing recall on twitter searches. Existing methods for word segmentation use unsupervised language models. We find that when using multiple corpora, the joint probability model from multiple corpora performs significantly better than the individual corpora. Motivated by this, we propose weighted joint probability model, with weights specific to each corpus. We propose to train the weights in a supervised manner using max-margin methods. The supervised probability models improve segmentation accuracy over joint probability models. Finally, we observe that length of segments is an important parameter for word segmentation, and incorporate length-specific weights into our model. The length specific models further improve segmentation accuracy over supervised probability models. For all models proposed here, inference problem can be solved using the dynamic programming algorithm. We test our methods on five different datasets, two from web domains data, and three from news headlines data from an LDC dataset. The supervised length specific models show significant improvements over unsupervised single corpus and joint probability models. Cross-testing between the datasets confirm that supervised probability models trained on all datasets, and length specific models trained on news headlines data, generalize well. Segmentation of hashtags result in significant improvement in recall on searches for twitter trends.
IEEE Transactions on Knowledge and Data Engineering | 2016
Abir De; Sourangshu Bhattacharya; Sourav Sarkar; Niloy Ganguly; Soumen Chakrabarti
Predicting plausible links that may emerge between pairs of nodes is an important task in social network analysis, with over a decade of active research. Here, we propose a novel framework for link prediction. It integrates signals from node features, the existing local link neighborhood of a node pair, community-level link density, and global graph properties. Our framework uses a stacked two-level learning paradigm. At the lower level, the first two kinds of features are processed by a novel local learner. Its outputs are then integrated with the last two kinds of features by a conventional discriminative learner at the upper-level. We also propose a new stratified sampling scheme for evaluating link prediction algorithms in the face of an extremely large number of potential edges, out of which very few will ever materialize. It is not tied to a specific application of link prediction, but robust to a range of application requirements. We report on extensive experiments with seven benchmark datasets and over five competitive baseline systems. The system we present consistently shows at least 10 percent accuracy improvement over state-of-the-art, and over 30 percent improvement in some cases. We also demonstrate, through ablation, that our features are complementary in terms of the signals and accuracy benefits they provide.
conference on information and knowledge management | 2017
Krunal Parmar; Samuel Bushi; Sourangshu Bhattacharya; Surender Kumar
Promotional listing of products or advertisements is a major source of revenue for online retail companies. These advertisements are often sold in the guaranteed delivery market, serving of which critically depends on the ability to predict supply or potential impressions from a target segment of users. In this paper, we study the problem of predicting user visits or potential ad-impressions to online retail websites, based on historical time-stamps. We explore the time-series and temporal point process models. We find that a successful model must encompass three properties of the data: (1) temporally non-homgeneous rates, (2) self excitation and (3) handling special events. We propose a novel non-homogeneous Hawkes process based model for the same, and new algorithm for fitting this model without overfitting the self-excitation part. We validate the proposed model and algorithm using mulitple large scale ad-serving dataset from a top online retail company in India.
international world wide web conferences | 2018
Abir De; Sourangshu Bhattacharya; Niloy Ganguly
The networked opinion diffusion in online social networks (OSN) is governed by the two genres of opinions-endogenous opinions that are driven by the influence of social contacts between users, and exogenous opinions which are formed by external effects like news, feeds etc. Such duplex opinion dynamics is led by users belonging to two categories- organic users who generally post endogenous opinions and extrinsic users who are susceptible to externalities, and mostly post the exogenous messages. Precise demarcation of endogenous and exogenous messages offers an important cue to opinion modeling, thereby enhancing its predictive performance. On the other hand, accurate user selection aids to detect extrinsic users, which in turn helps in opinion shaping. In this paper, we design CherryPick, a novel learning machinery that classifies the opinions and users by solving a joint inference task in message and user set, from a temporal stream of sentiment messages. Furthermore, we validate the efficacy of our proposal from both modeling and shaping perspectives. Moreover, for the latter, we formulate the opinion shaping problem in a novel framework of stochastic optimal control, in which the selected extrinsic users optimally post exogenous messages so as to guide the opinions of others in a desired way. On five datasets crawled from Twitter, CherryPick offers a significant accuracy boost in terms of opinion forecasting, against several competitors. Furthermore, it can precisely determine the quality of a set of control users, which together with the proposed online shaping strategy, consistently steers the opinion dynamics more effectively than several state-of-the-art baselines.
international conference of distributed computing and networking | 2018
Chandan Misra; Swastik Haldar; Sourangshu Bhattacharya; Soumya K. Ghosh
The growth of big data in domains such as Earth Sciences, Social Networks, Physical Sciences, etc. has lead to an immense need for efficient and scalable linear algebra operations, e.g. Matrix inversion. Existing methods for efficient and distributed matrix inversion using big data platforms rely on LU decomposition based block-recursive algorithms. However, these algorithms are complex and require a lot of side calculations, e.g. matrix multiplication, at various levels of recursion. In this paper, we propose a different scheme based on Strassens matrix inversion algorithm (mentioned in Strassens original paper in 1969), which uses far fewer operations at each level of recursion. We implement the proposed algorithm, and through extensive experimentation, show that it is more efficient than the state of the art methods. Furthermore, we provide a detailed theoretical analysis of the proposed algorithm, and derive theoretical running times which match closely with the empirically observed wall clock running times, thus explaining the U-shaped behaviour w.r.t. block-sizes.
IEEE Transactions on Circuits and Systems I-regular Papers | 2018
Praful P. Pai; Pradyut Kumar Sanki; Sudeep K. Sahoo; Arijit De; Sourangshu Bhattacharya; Swapna Banerjee
Near infrared photoacoustic spectroscopy is utilized for the development of a continuous non-invasive glucose monitoring system for diabetics. A portable embedded system for taking photoacoustic measurements on tissues to estimate glucose concentration is implemented using field programmable gate array (FPGA). The back-end architecture for high-speed data acquisition and de-noising of photoacoustic measurements operates at 274.823 MHz on a Xilinx Virtex-II Pro FPGA. The glucose measurement technique is verified in vitro on glucose solutions and in vivo on tissues, with photoacoustic signal amplitude varying linearly with sample glucose concentration. A kernel-based regression algorithm using multiple features of the photoacoustic signal is used to estimate glucose concentration from photoacoustic measurements. The calibration algorithm provides a superior performance over previous efforts with a mean absolute relative difference of 8.84% and Clarke Error Grid distribution of 92.86% and 7.14% over Zones A and B of the grid. A cloud computing platform for automated monitoring of blood glucose levels is proposed to enable individuals with diabetes to connect with doctors and caretakers. The developed system is connected to the cloud service using a mobile device, which facilitates implementation of computationally intensive calibration tasks and the storage and analysis of measurement data for treatment and monitoring.
advances in social networks analysis and mining | 2017
Sankarshan Mridha; Sayan Ghosh; Robin Singh; Sourangshu Bhattacharya; Niloy Ganguly
In recent times, people regularly discuss about poor travel experience due to various road closure incidents in the social networking sites. One of the fallouts of these road blocking incidents is the dynamic shift in regular taxi pickup locations. Although traffic monitoring from social media content has lately gained widespread interest, however, none of the recent works has tried to understand this relocation of taxi pickup hotspots during any road closure activity. In this work, we have tried to predict the taxi pickup hotspots, during various road closure incidents, using their past taxi pickup trend. We have proposed a two-step methodology. First, we identify and extract road closure information from social network posts. Second, leveraging the inferred knowledge, prediction of taxi pickup hotspot is done near the activity location with an average accuracy of ~ 86.04%, where the predicted locations are within an average radius of only 0.011 mile from the original hotspots.