Suresh Venkatasubramanian

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Suresh Venkatasubramanian is active.

Explore More

Publication

Featured researches published by Suresh Venkatasubramanian.

international world wide web conferences | 1998

The connectivity server: fast access to linkage information on the Web

Krishna Bharat; Andrei Z. Broder; Monika Rauch Henzinger; Puneet Kumar; Suresh Venkatasubramanian

Abstract We have built a server that provides linkage information for all pages indexed by the AltaVista search engine. In its basic operation, the server accepts a query consisting of a set L of one or more URLs and returns a list of all pages that point to pages in L (predecessors) and a list of all pages that are pointed to from pages in L (successors). More generally the server can produce the entire neighbourhood (in the graph theory sense) of L up to a given distance and can include information about all links that exist among pages in the neighbourhood. Although some of this information can be retrieved directly from Alta Vista or other search engines, these engines are not optimized for this purpose and the process of constructing the neighbourhood of a given set of pages is show and laborious. In contrast our prototype server needs less than 0.1 ms per result URL. So far we have built two applications that use the Connectivity Server: a direct interface that permits fast navigation of the Web via the predecessor/successor relation, and a visualization tool for the neighbourhood of a given set of pages. We envisage numerous other applications such as ranking, visualization, and classification.

knowledge discovery and data mining | 2015

Certifying and Removing Disparate Impact

Michael Feldman; Sorelle A. Friedler; John Moeller; Carlos Scheidegger; Suresh Venkatasubramanian

What does it mean for an algorithm to be biased? In U.S. law, unintentional bias is encoded via disparate impact, which occurs when a selection process has widely different outcomes for different groups, even as it appears to be neutral. This legal determination hinges on a definition of a protected class (ethnicity, gender) and an explicit description of the process. When computers are involved, determining disparate impact (and hence bias) is harder. It might not be possible to disclose the process. In addition, even if the process is open, it might be hard to elucidate in a legal setting how the algorithm makes its decisions. Instead of requiring access to the process, we propose making inferences based on the data it uses. We present four contributions. First, we link disparate impact to a measure of classification accuracy that while known, has received relatively little attention. Second, we propose a test for disparate impact based on how well the protected class can be predicted from the other attributes. Third, we describe methods by which data might be made unbiased. Finally, we present empirical evidence supporting the effectiveness of our test for disparate impact and our approach for both masking bias and preserving relevant information in the data. Interestingly, our approach resembles some actual selection practices that have recently received legal scrutiny.

NeuroImage | 2009

The geometric median on Riemannian manifolds with application to robust atlas estimation

P. Thomas Fletcher; Suresh Venkatasubramanian; Sarang C. Joshi

One of the primary goals of computational anatomy is the statistical analysis of anatomical variability in large populations of images. The study of anatomical shape is inherently related to the construction of transformations of the underlying coordinate space, which map one anatomy to another. It is now well established that representing the geometry of shapes or images in Euclidian spaces undermines our ability to represent natural variability in populations. In our previous work we have extended classical statistical analysis techniques, such as averaging, principal components analysis, and regression, to Riemannian manifolds, which are more appropriate representations for describing anatomical variability. In this paper we extend the notion of robust estimation, a well established and powerful tool in traditional statistical analysis of Euclidian data, to manifold-valued representations of anatomical variability. In particular, we extend the geometric median, a classic robust estimator of centrality for data in Euclidean spaces. We formulate the geometric median of data on a Riemannian manifold as the minimizer of the sum of geodesic distances to the data points. We prove existence and uniqueness of the geometric median on manifolds with non-positive sectional curvature and give sufficient conditions for uniqueness on positively curved manifolds. Generalizing the Weiszfeld procedure for finding the geometric median of Euclidean data, we present an algorithm for computing the geometric median on an arbitrary manifold. We show that this algorithm converges to the unique solution when it exists. In this paper we exemplify the robustness of the estimation technique by applying the procedure to various manifolds commonly used in the analysis of medical images. Using this approach, we also present a robust brain atlas estimation technique based on the geometric median in the space of deformable images.

IEEE Transactions on Knowledge and Data Engineering | 2010

Closeness: A New Privacy Measure for Data Publishing

Ninghui Li; Tiancheng Li; Suresh Venkatasubramanian

The k-anonymity privacy requirement for publishing microdata requires that each equivalence class (i.e., a set of records that are indistinguishable from each other with respect to certain “identifying” attributes) contains at least k records. Recently, several authors have recognized that k-anonymity cannot prevent attribute disclosure. The notion of ℓ-diversity has been proposed to address this; ℓ-diversity requires that each equivalence class has at least ℓ well-represented (in Section 2) values for each sensitive attribute. In this paper, we show that ℓ-diversity has a number of limitations. In particular, it is neither necessary nor sufficient to prevent attribute disclosure. Motivated by these limitations, we propose a new notion of privacy called “closeness.” We first present the base model t-closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t). We then propose a more flexible privacy model called (n,t)-closeness that offers higher utility. We describe our desiderata for designing a distance measure between two probability distributions and present two distance measures. We discuss the rationale for using closeness as a privacy measure and illustrate its advantages through examples and experiments.

symposium on discrete algorithms | 2006

Streaming and sublinear approximation of entropy and information distances

Sudipto Guha; Andrew McGregor; Suresh Venkatasubramanian

In most algorithmic applications which compare two distributions, information theoretic distances are more natural than standard lp norms. In this paper we design streaming and sublinear time property testing algorithms for entropy and various information theoretic distances.Batu et al posed the problem of property testing with respect to the Jensen-Shannon distance. We present optimal algorithms for estimating bounded, symmetric f-divergences (including the Jensen-Shannon divergence and the Hellinger distance) between distributions in various property testing frameworks. Along the way, we close a (log n)/H gap between the upper and lower bounds for estimating entropy H, yielding an optimal algorithm over all values of the entropy. In a data stream setting (sublinear space), we give the first algorithm for estimating the entropy of a distribution. Our algorithm runs in polylogarithmic space and yields an asymptotic constant factor approximation scheme. An integral part of the algorithm is an interesting use of an F0 (the number of distinct elements in a set) estimation algorithm; we also provide other results along the space/time/approximation tradeoff curve.Our results have interesting structural implications that connect sublinear time and space constrained algorithms. The mediating model is the random order streaming model, which assumes the input is a random permutation of a multiset and was first considered by Munro and Paterson in 1980. We show that any property testing algorithm in the combined oracle model for calculating a permutation invariant functions can be simulated in the random order model in a single pass. This addresses a question raised by Feigenbaum et al regarding the relationship between property testing and stream algorithms. Further, we give a polylog-space PTAS for estimating the entropy of a one pass random order stream. This bound cannot be achieved in the combined oracle (generalized property testing) model.

Journal of Mathematical Imaging and Vision | 2007

Curve Matching, Time Warping, and Light Fields: New Algorithms for Computing Similarity between Curves

Alon Efrat; Quanfu Fan; Suresh Venkatasubramanian

The problem of curve matching appears in many application domains, like time series analysis, shape matching, speech recognition, and signature verification, among others. Curve matching has been studied extensively by computational geometers, and many measures of similarity have been examined, among them being the Fréchet distance (sometimes referred in folklore as the “dog-man” distance).A measure that is very closely related to the Fréchet distance but has never been studied in a geometric context is the Dynamic Time Warping measure (DTW), first used in the context of speech recognition. This measure is ubiquitous across different domains, a surprising fact because notions of similarity usually vary significantly depending on the application. However, this measure suffers from some drawbacks, most importantly the fact that it is defined between sequences of points rather than curves. Thus, the way in which a curve is sampled to yield such a sequence can dramatically affect the quality of the result. Some attempts have been made to generalize the DTW to continuous domains, but the resulting algorithms have exponential complexity.In this paper we propose similarity measures that attempt to capture the “spirit” of dynamic time warping while being defined over continuous domains, and present efficient algorithms for computing them. Our formulation leads to a very interesting connection with finding short paths in a combinatorial manifold defined on the input chains, and in a deeper sense relates to the way light travels in a medium of variable refractivity.

symposium on geometry processing | 2005

Global registration of multiple 3D point sets via optimization-on-a-manifold

Shankar Krishnan; Pei Yean Lee; John Barrett Moore; Suresh Venkatasubramanian

We propose a novel algorithm to register multiple 3D point sets within a common reference frame using a manifold optimization approach. The point sets are obtained with multiple laser scanners or a mobile scanner. Unlike most prior algorithms, our approach performs an explicit optimization on the manifold of rotations, allowing us to formulate the registration problem as an unconstrained minimization on a constrained manifold. This approach exploits the Lie group structure of SO3 and the simple representation of its associated Lie algebra so3 in terms of R3. Our contributions are threefold. We present a new analytic method based on singular value decompositions that yields a closed-form solution for simultaneous multiview registration in the noise-free scenario. Secondly, we use this method to derive a good initial estimate of a solution in the noise-free case. This initialization step may be of use in any general iterative scheme. Finally, we present an iterative scheme based on Newtons method on SO3 that has locally quadratic convergence. We demonstrate the efficacy of our scheme on scan data taken both from the Digital Michelangelo project and from scans extracted from models, and compare it to some of the other well known schemes for multiview registration. In all cases, our algorithm converges much faster than the other approaches, (in some cases orders of magnitude faster), and generates consistently higher quality registrations.

IEEE Transactions on Mobile Computing | 2014

Multiple Target Tracking with RF Sensor Networks

Maurizio Bocca; Ossi Kaltiokallio; Neal Patwari; Suresh Venkatasubramanian

RF sensor networks are wireless networks that can localize and track people (or targets) without needing them to carry or wear any electronic device. They use the change in the received signal strength (RSS) of the links due to the movements of people to infer their locations. In this paper, we consider real-time multiple target tracking with RF sensor networks. We apply radio tomographic imaging (RTI), which generates images of the change in the propagation field, as if they were frames of a video. Our RTI method uses RSS measurements on multiple frequency channels on each link, combining them with a fade level-based weighted average. We introduce methods, inspired by machine vision and adapted to the peculiarities of RTI, that enable accurate and real-time multiple target tracking. Several tests are performed in an open environment, a one-bedroom apartment, and a cluttered office environment. The results demonstrate that the system is capable of accurately tracking in real-time up to four targets in cluttered indoor environments, even when their trajectories intersect multiple times, without mis-estimating the number of targets found in the monitored area. The highest average tracking error measured in the tests is 0.45 m with two targets, 0.46 m with three targets, and 0.55 m with four targets.

very large data bases | 2004

Compressing large boolean matrices using reordering techniques

David S. Johnson; Shankar Krishnan; Jatin Chhugani; Subodh Kumar; Suresh Venkatasubramanian

Large boolean matrices are a basic representational unit in a variety of applications, with some notable examples being interactive visualization systems, mining large graph structures, and association rule mining. Designing space and time efficient scalable storage and query mechanisms for such large matrices is a challenging problem. We present a lossless compression strategy to store and access such large matrices efficiently on disk. Our approach is based on viewing the columns of the matrix as points in a very high dimensional Hamming space, and then formulating an appropriate optimization problem that reduces to solving an instance of the Traveling Salesman Problem on this space. Finding good solutions to large TSPs in high dimensional Hamming spaces is itself a challenging and little-explored problem -- we cannot readily exploit geometry to avoid the need to examine all N2 inter-city distances and instances can be too large for standard TSP codes to run in main memory. Our multi-faceted approach adapts classical TSP heuristics by means of instance-partitioning and sampling, and may be of independent interest. For instances derived from interactive visualization and telephone call data we obtain significant improvement in access time over standard techniques, and for the visualization application we also make significant improvements in compression.

information processing in sensor networks | 2013

Radio tomographic imaging and tracking of stationary and moving people via kernel distance

Yang Zhao; Neal Patwari; Jeff M. Phillips; Suresh Venkatasubramanian

Network radio frequency (RF) environment sensing (NRES) systems pinpoint and track people in buildings using changes in the signal strength measurements made by a wireless sensor network. It has been shown that such systems can locate people who do not participate in the system by wearing any radio device, even through walls, because of the changes that moving people cause to the static wireless sensor network. However, many such systems cannot locate stationary people. We present and evaluate a system which can locate stationary or moving people, without calibration, by using kernel distance to quantify the difference between two histograms of signal strength measurements. From five experiments, we show that our kernel distance-based radio tomographic localization system performs better than the state-of-the-art NRES systems in different non line-of-sight environments.

Explore More