Ashish P. Sanil | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ashish P. Sanil is active.

Explore More

Publication

Featured researches published by Ashish P. Sanil.

Journal of Computational and Graphical Statistics | 2005

Secure Regression on Distributed Databases

Alan F. Karr; Xiaodong Lin; Ashish P. Sanil

This article presents several methods for performing linear regression on the union of distributed databases that preserve, to varying degrees, confidentiality of those databases. Such methods can be used by federal or state statistical agencies to share information from their individual databases, or to make such information available to others. Secure data integration, which provides the lowest level of protection, actually integrates the databases, but in a manner that no database owner can determine the origin of any records other than its own. Regression, associated diagnostics, or any other analysis then can be performed on the integrated data. Secure multiparty computation, based on shared local statistics effects computations necessary to compute least squares estimators of regression coefficients and error variances by means of analogous local computations that are combined additively using the secure summation protocol. We also provide two approaches to model diagnostics in this setting, one using shared residual statistics and the other using secure integration of synthetic residuals.

knowledge discovery and data mining | 2004

Privacy preserving regression modelling via distributed computation

Ashish P. Sanil; Alan F. Karr; Xiaodong Lin

Reluctance of data owners to share their possibly confidential or proprietary data with others who own related databases is a serious impediment to conducting a mutually beneficial data mining analysis. We address the case of vertically partitioned data -- multiple data owners/agencies each possess a few attributes of every data record. We focus on the case of the agencies wanting to conduct a linear regression analysis with complete records without disclosing values of their own attributes. This paper describes an algorithm that enables such agencies to compute the exact regression coefficients of the global regression equation and also perform some basic goodness-of-fit diagnostics while protecting the confidentiality of their data. In more general settings beyond the privacy scenario, this algorithm can also be viewed as method for the distributed computation for regression analyses.

Statistical Science | 2005

Data Dissemination and Disclosure Limitation in a World Without Microdata: A Risk–Utility Framework for Remote Access Analysis Servers

S. Gomatam; Alan F. Karr; Ashish P. Sanil

Given the public’s ever-increasing concerns about data confidentiality, in the near future statistical agencies may be unable or unwilling, or even may not be legally allowed, to release any genuine microdata—data on individual units, such as individuals or establishments. In such a world, an alternative dissemination strategy is remote access analysis servers, to which users submit requests for output from statistical models fit using the data, but are not allowed access to the data themselves. Analysis servers, however, are not free from the risk of disclosure, especially in the face of multiple, interacting queries. We describe these risks and propose quantifiable measures of risk and data utility that can be used to specify which queries can be answered, and with what output. The risk-utility framework is illustrated for regression models.

The Journal of Clinical Pharmacology | 2009

Phase I Study of the Effect of Gastric Acid pH Modulators on the Bioavailability of Oral Dasatinib in Healthy Subjects

Timothy Eley; Feng R. Luo; Shruti Agrawal; Ashish P. Sanil; James Manning; Tong Li; Anne Blackwood-Chirchir; Richard Bertz

Dasatinib is a tyrosine kinase inhibitor (including BCR‐ABL and the SRC family) that is effective in patients with chronic myeloid leukemia. Dasatinib has pH‐dependent solubility and is bioavailable as an oral formulation. The effect of gastric pH modifiers on dasatinib pharmacokinetics is evaluated in an open‐label, randomized, 3‐period, 3‐treatment crossover study. Twenty‐four healthy subjects receive treatment A (2 doses of dasatinib 50 mg separated by 12 hours), treatment B (famotidine 40 mg given 2 hours after dasatinib 50 mg and 10 hours before another dose of dasatinib 50 mg), and treatment C (30 mL of an antacid containing aluminum/magnesium hydroxides given 2 hours before dasatinib 50 mg and concomitantly with dasatinib 50 mg 12 hours after the previous dasatinib dose); a 7‐day washout separates each treatment period. When famotidine is administered 2 hours after dasatinib, dasatinib exposure is similar to dasatinib administered alone. However, dasatinib exposure is reduced by ∼60% when famotidine is administered 10 hours before dasatinib dosing. In contrast, dasatinib exposure is unchanged when antacid (Maalox) is administered 2 hours before dasatinib; but when the antacid is coadministered with dasatinib, dasatinib exposure is reduced by ∼55% to 58%. This indicates that H2‐receptor antagonists should not be coadministered with dasatinib. Dasatinib may be administered with acid‐neutralizing antacids if the doses are temporally separated by at least 2 hours.

Social Networks | 1995

Models for evolving fixed node networks: model fitting and model testing

Ashish P. Sanil; David Banks; Kathleen M. Carley

Abstract Researchers in social networks are becoming increasingly interested in how networks evolve over time. There are theories that bear on the evolution of networks, but virtually no statistical methodology which supports the comparative evaluation of these theories. In this paper, we present explicit probability models for networks that change over time, covering a range of simple but significant qualitative behavior. Maximum likelihood estimates of model parameters which describe the rate of change of the network are derived, and some of their sampling properties are elucidated. To calculate these estimates the researcher must have measurements upon the trajectory of a network — these are the values of the network at successive time points. We also describe goodness-of-fit tests for assessing model adequacy, and use Newcombs dataset to illustrate the methodology.

foundations of software engineering | 2005

Applying classification techniques to remotely-collected program execution data

Murali Haran; Alan F. Karr; Alessandro Orso; Adam A. Porter; Ashish P. Sanil

There is an increasing interest in techniques that support measurement and analysis of fielded software systems. One of the main goals of these techniques is to better understand how software actually behaves in the field. In particular, many of these techniques require a way to distinguish, in the field, failing from passing executions. So far, researchers and practitioners have only partially addressed this problem: they have simply assumed that program failure status is either obvious (i.e., the program crashes) or provided by an external source (e.g., the users). In this paper, we propose a technique for automatically classifying execution data, collected in the field, as coming from either passing or failing program runs. (Failing program runs are executions that terminate with a failure, such as a wrong outcome.) We use statistical learning algorithms to build the classification models. Our approach builds the models by analyzing executions performed in a controlled environment (e.g., test cases run in-house) and then uses the models to predict whether execution data produced by a fielded instance were generated by a passing or failing program execution. We also present results from an initial feasibility study, based on multiple versions of a software subject, in which we investigate several issues vital to the applicability of the technique. Finally, we present some lessons learned regarding the interplay between the reliability of classification models and the amount and type of data collected.

Statistics and Computing | 2003

Preserving confidentiality of high-dimensional tabulated data: Statistical and computational issues

Adrian Dobra; Alan F. Karr; Ashish P. Sanil

Dissemination of information derived from large contingency tables formed from confidential data is a major responsibility of statistical agencies. In this paper we present solutions to several computational and algorithmic problems that arise in the dissemination of cross-tabulations (marginal sub-tables) from a single underlying table. These include data structures that exploit sparsity to support efficient computation of marginals and algorithms such as iterative proportional fitting, as well as a generalized form of the shuttle algorithm that computes sharp bounds on (small, confidentiality threatening) cells in the full table from arbitrary sets of released marginals. We give examples illustrating the techniques.

Communications of The ACM | 2003

Table servers protect confidentiality in tabular data releases

Alan F. Karr; Adrian Dobra; Ashish P. Sanil

Federal statistical agencies must balance concern over confidentiality of data with their obligation to report information to the public. Advances in IT threaten privacy, but new technologies can also protect confidentiality while meeting user needs in innovative ways.

Chance | 2004

Analysis of Integrated Data without Data Integration

Alan F. Karr; Xiaodong Lin; Ashish P. Sanil

M scientific and policy investigations require statistical analyses that “integrate” data stored in multiple, distributed databases. For example, a regression analysis on integrated state databases about factors influencing student performance would be more insightful than individual analyses, or complementary to them. Other contexts where the same need arises range from homeland security to environmental monitoring. At the same time, the barriers to actually integrating the databases are numerous. One is confidentiality: the database holders—we term them “agencies”—almost always wish to protect the identities of their data subjects. Another is regulation: the agencies may be forbidden by law to share their data, either with each other or with a trusted third party. A third is scale: despite advances in networking technology, the only way to move a terabyte of data from point A today to point B tomorrow is FedEx. The good news is that for many analyses it is not necessary to move the data. Instead, using techniques from computer science known generically as secure multiparty computation, the agencies can share summaries of the data anonymously, but in a way that the analysis can be performed in a statistically principled manner. In this article we illustrate linear regression on “horizontally partitioned data.” Only one concept is needed, that of secure summation, which is shown in Figure 1. There are other approaches to this problem for lower risk situations, as How can secure multiparty computation enable agencies to share information without sacrificing confidentiality?

Journal of Computer-aided Molecular Design | 2005

Secure analysis of distributed chemical databases without data integration.

Alan F. Karr; Jun Feng; Xiaodong Lin; Ashish P. Sanil; S. Stanley Young

SummaryWe present a method for performing statistically valid linear regressions on the union of distributed chemical databases that preserves confidentiality of those databases. The method employs secure multi-party computation to share local sufficient statistics necessary to compute least squares estimators of regression coefficients, error variances and other quantities of interest. We illustrate our method with an example containing four companies’ rather different databases.

Explore More