George T. Duncan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where George T. Duncan is active.

Explore More

Publication

Featured researches published by George T. Duncan.

Journal of the American Statistical Association | 1986

Disclosure-Limited Data Dissemination

George T. Duncan; Diane Lambert

Abstract Statistical agencies use a variety of disclosure control policies with ad hoc justification in disseminating data. The issues involved are clarified here by showing that several of these policies are special cases of a general disclosure-limiting (DL) approach based on predictive distributions and uncertainty functions. A users information posture regarding a target is represented by one predictive distribution before data release and another predictive distribution after data release. A users lack of knowledge about the target at any time is measured by an uncertainty function applied either to the current predictive distribution or to the current predictive distribution and the previously held predictive distribution. Common disclosure control policies, such as requiring released cell relative frequencies to be bounded away from both zero and one, are shown to be equivalent to disclosure rules that allow data release only if specific uncertainty functions at particular predictive distribution...

Journal of Business & Economic Statistics | 1989

The Risk of Disclosure for Microdata

George T. Duncan; Diane Lambert

Statistical agencies that provide microdata for public use strive to keep the risk of disclosure of confidential information negligible. Assessing the magnitude of the risk of disclosure is not easy, however. Whether a data user or intruder attempts to obtain confidential information from a public-use file depends on the perceived costs of identifying a record, the perceived probability of success, and the information expected to be gained. In this article, a decision-theoretic framework for risk assessment that includes the intruders objectives and strategy for compromising the data base and the information gained by the intruder is developed. Two kinds of microdata disclosure are distinguished—disclosure of a respondents identity and disclosure of a respondents attributes as a result of an unauthorized identification. A formula for the risk of identity disclosure is given, and a simple approximation to it is evaluated.

conference on information and knowledge management | 2006

Incremental hierarchical clustering of text documents

Nachiketa Sahoo; Jamie Callan; Ramayya Krishnan; George T. Duncan; Rema Padman

Incremental hierarchical text document clustering algorithms are important in organizing documents generated from streaming on-line sources, such as, Newswire and Blogs. However, this is a relatively unexplored area in the text document clustering literature. Popular incremental hierarchical clustering algorithms, namely Cobweb and Classit, have not been widely used with text document data. We discuss why, in the current form, these algorithms are not suitable for text clustering and propose an alternative formulation that includes changes to the underlying distributional assumption of the algorithm in order to conform with the data. Both the original Classit algorithm and our proposed algorithm are evaluated using Reuters newswire articles and Ohsumed dataset.

Journal of the American Statistical Association | 2000

Optimal Disclosure Limitation Strategy in Statistical Databases: Deterring Tracker Attacks through Additive Noise

George T. Duncan; Sumitra Mukherjee

Abstract Disclosure limitation methods transform statistical databases to protect confidentiality, a practical concern of statistical agencies. A statistical database responds to queries with aggregate statistics. The database administrator should maximize legitimate data access while keeping the risk of disclosure below an acceptable level. Legitimate users seek statistical information, generally in aggregate form; malicious users—the data snoopers—attempt to infer confidential information about an individual data subject. Tracker attacks are of special concern for databases accessed online. This article derives optimal disclosure limitation strategies under tracker attacks for the important case of data masking through additive noise. Operational measures of the utility of data access and of disclosure risk are developed. The utility of data access is expressed so that trade-offs can be made between the quantity and the quality of data to be released. Application is made to Ohio data from the 1990 census. The article derives conditions under which an attack by a data snooper is better thwarted by a combination of query restriction and data masking than by either disclosure limitation method separately. Data masking by independent noise addition and data perturbation are considered as extreme cases in the continuum of data masking using positively correlated additive noise. Optimal strategies are established for the data snooper. Circumstances are determined under which adding autocorrelated noise is preferable to using existing methods of either independent noise addition or data perturbation. Both moving average and autoregressive noise addition are considered.

American Journal of Political Science | 1978

The Dynamics of Warfare: 1816-1965

William W. Davis; George T. Duncan; Randolph M. Siverson

Stochastic models are constructed to illuminate the dynamic incidence of international warfare during the period from 1816 to 1965. It is argued that the probabilistic structure of this incidence is revealed most clearly through an analysis based on dyads of nations, thereby disassembling multilateral wars such as World War II. The conceptual focus is on a clear delineation of heterogeneity, both over time and over actors, and of contagion, differentiating its addictive and infectious varieties. Departures from randomness are considered as modifications of the Poisson process, and empirical tests of the assumptions are made by analysis of this process. A conclusion of positive infection is supported by the data. An autoregressive model of order 4 is found to fit the interarrival times adequately and account for the infectious behavior. This suggests an infectious impact of moderate time duration.

Archive | 2001

Forecasting Analogous Time Series

George T. Duncan; Wilpen Gorr; Janusz Szczypula

Organizations that use time-series forecasting regularly, generally use it for many products or services. Among the variables they forecast are groups of analogous time series (series that follow similar, time-based patterns). Their covariation is a largely untapped source of information that can improve forecast accuracy. We take the Bayesian pooling approach to drawing information from analogous time series to model and forecast a given time series. In using Bayesian pooling, we use data from analogous time series as multiple observations per time period in a group-level model. We then combine estimated parameters of the group model with conventional time-series-model parameters, using so-called weights shrinkage. Major benefits of this approach are that it (1) requires few parameters for estimation; (2) builds directly on conventional time-series models; (3) adapts to pattern changes in time series, providing rapid adjustments and accurate model estimates; and (4) screens out adverse effects of outlier data points on time-series model estimates. For practitioners, we provide the terms, concepts, and methods necessary for a basic understanding of Bayesian pooling and the conditions under which it improves upon conventional time-series methods. For researchers, we describe the experimental data, treatments, and factors needed to compare the forecast accuracy of pooling methods. Last, we present basic principles for applying pooling methods and supporting empirical results. Conditions favoring pooling include time series with high volatility and outliers. Simple pooling methods are more accurate than complex methods, and we recommend manual intervention for cases with few time series.

Information Systems Research | 2012

Research Note---The Halo Effect in Multicomponent Ratings and Its Implications for Recommender Systems: The Case of Yahoo! Movies

Nachiketa Sahoo; Ramayya Krishnan; George T. Duncan; Jamie Callan

Collaborative filtering algorithms learn from the ratings of a group of users on a set of items to find personalized recommendations for each user. Traditionally they have been designed to work with one-dimensional ratings. With interest growing in recommendations based on multiple aspects of items, we present an algorithm for using multicomponent rating data. The presented mixture model-based algorithm uses the component rating dependency structure discovered by a structure learning algorithm. The structure is supported by the psychometric literature on the halo effect. This algorithm is compared with a set of model-based and instance-based algorithms for single-component ratings and their variations for multicomponent ratings. We evaluate the algorithms using data from Yahoo! Movies. Use of multiple components leads to significant improvements in recommendations. However, we find that the choice of algorithm depends on the sparsity of the training data. It also depends on whether the task of the algorithm is to accurately predict ratings or to retrieve relevant items. In our experiments a model-based multicomponent rating algorithm is able to better retrieve items when training data are sparse. However, if the training data are not sparse, or if we are trying to predict the rating values accurately, then the instance-based multicomponent rating collaborative filtering algorithms perform better. Beyond generating recommendations we show that the proposed model can fill in missing rating components. Theories in psychometric literature and the empirical evidence suggest that rating specific aspects of a subject is difficult. Hence, filling in the missing component values leads to the possibility of a rater support system to facilitate gathering of multicomponent ratings.

Technometrics | 1978

An Empirical Study of Jackknife-Constructed Confidence Regions in Nonlinear Regression

George T. Duncan

A jack knife procedure for constructing confidence regions in nonlinear regression is examined using Monte Carlo simulation. The jack knife promises to be asymptotically double-edged, being both independent of linearizing approximations to the regression surface and insensitive to specification of the error distribution. For moderate sample sizes the jack knife cannot be trusted in establishing joint confidence regions.

ieee symposium on security and privacy | 1991

Microdata disclosure limitation in statistical databases: query size and random sample query control

George T. Duncan; Sumitra Mukherjee

A probabilistic framework can be used to assess the risk of disclosure of confidential information in statistical databases that use disclosure control mechanisms. The authors show how the method may be used to assess the strengths and weaknesses of two existing disclosure control mechanisms: the query set size restriction control and random sample query control mechanisms. Results indicate that neither scheme provides adequate security. The framework is then further exploited to analyze an alternative scheme combining query set size restriction and random sample query control. It is shown that this combination results in a significant decrease in the risk of disclosure.<<ETX>>

International Studies Quarterly | 1982

Flexibility of Alliance Partner Choice in a Multipolar System: Models and Tests

George T. Duncan; Randolph M. Siverson

International system theorists usually hypothesize great flexibility of alliance partner choice among the major powers in a multipolar system. To test for the existence of such flexibility, three statistically testable hypotheses of alliance partner choice in a multipolar system are derived. Log-linear model procedures are developed for testing hypotheses of dyadic independence and homogeneity in alliance partner choice. A Markov chain model provides the framework to test the hypothesis of random temporal sequencing of alliance choice. Two data sets giving international alliance choices of the major powers are used to test these models. One data set contains only formal, i.e., written, alliances entered into between 1815 and 1913, and each of the three models is found to be consistent with these data. In the case of the second data set (containing formal and informal alliances entered into between 1814 and 1913), the dyadic independence and homogeneity models are rejected, but a random sequential Choice model is accepted. Differences among data sets are discussed, and it is concluded that the formal alliances more accurately reflect the structure of the major-power international system, and thus all three hypotheses are acceptable for the system.

Explore More