Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where George Diehr is active.

Publication


Featured researches published by George Diehr.


Siam Journal on Scientific and Statistical Computing | 1985

Evaluation of a Branch and Bound Algorithm for Clustering

George Diehr

A branch and bound algorithm for optimal clustering is developed and applied to a variety of test problems. The objective function is minimization of within-group sum-of-squares although the algorithm can be applied to loss functions which meet certain conditions. The algorithm is based on earlier work of Koontz et. al. (1975).The efficiency of the method for determining optimal solutions is studied as a function of problem size, number of clusters, and underlying degree of separability of the observations. The value of the approach in determining lower bounds is also investigated.We conclude that the method is practical for problems of up to 100 or so observations if the number of clusters is about 6 or less and the clusters are reasonably well separated. If separation is poor and/or a larger number of clusters are sought, the computing time increases significantly. The approach provides very tight lower bounds early in the enumeration for problems with moderate separation and six or fewer clusters.


Annals of Operations Research | 1994

Multiple-type, two-dimensional bin packing problems: Applications and algorithms

Bernard T. Han; George Diehr; Jack S. Cook

In this paper we consider a class of bin selection and packing problems (BPP) in which potential bins are of various types, have two resource constraints, and the resource requirement for each object differs for each bin type. The problem is to select bins and assign the objects to bins so as to minimize the sum of bin costs while meeting the two resource constraints. This problem represents an extension of the classical two-dimensional BPP in which bins are homogeneous. Typical applications of this research include computer storage device selection with file assignment, robot selection with work station assignment, and computer processor selection with task assignment. Three solution algorithms have been developed and tested: a simple greedy heuristic, a method based onsimulated annealing (SA) and an exact algorithm based onColumn Generation with Branch and Bound (CG). An LP-based method for generating tight lower bounds was also developed (LB). Several hundred test problems based on computer storage device selection and file assignment were generated and solved. The heuristic solved problems up to 100 objects in less than a second; average solution value was within about 3% of the optimum. SA improved solutions to an average gap of less than 1% but a significant increase in computing time. LB produced average lower bounds within 3% of optimum within a few seconds. CG is practical for small to moderately-sized problems — possibly as many as 50 objects.


Technometrics | 1974

Approximating the Distribution of the Sample R2 in Best Subset Regressions

George Diehr; Donald R. Hoflin

This note presents research on the problem of determining the distribution of the usual sample R 2 statistic in multiple regression studies where the variables to be included in the regression equation are the subset of k variables, from a set of m variables, which maximize the sample R 2 value or satisfy some similar criterion. A Monte—Carlo approach was used to estimate certain percentile points of the distribution of R 2 under the null hypothesis of independence between the dependent variable and the m independent variables. A function has been developed which appears to provide a good approximation to percentile points of the R 2 distribution.


Communications of The ACM | 1984

Optimal pagination of B-trees with variable-length items

George Diehr; Bruce Faaland

Two algorithms are developed for the optimal organization of B-trees (and variations) with variable-length items. The first algorithm solves a problem posed by McCreight, that of finding a pagination of n items that minimizes the sum of key lengths promoted to the next higher level of the tree. The algorithm requires O(n log n) time and O(n) space. The second algorithm constructs the minimum depth tree in O(n 3 log n) time from the n items. Both methods rely on dynamic programming arguments and can be interpreted as shortest-path problems. Practical approaches for implementing the algorithms are discussed.


Technometrics | 1974

Approximating the Distribution of the Sample R 2 in Best Subset Regressions

George Diehr; Donald R. Hoflin

This note presents research on the problem of determining the distribution of the usual sample R2 statistic in multiple regression studies where the variables to be included in the regression equation are the subset of k variables, from a set of m variables, which maximize the sample R2 value or satisfy some similar criterion. A Monte-Carlo approach was used to estimate certain percentile points of the distribution of R2 under the null hypothesis of independence between the dependent variable and the m independent variables. A function has been developed which appears to provide a good approximation to percentile points of the R2 distribution.


IEEE Transactions on Knowledge and Data Engineering | 1994

Estimating block accesses in database organizations

George Diehr; Aditya N. Saharia

The exact expression for the expected number of disk accesses required to retrieve a given number of records, called the Yao function, requires iterative computations. Several authors have developed approximations to the Yao function, all of which have substantial errors in some situations. We derive and evaluate simple upper and lower bounds that never differ by more than a small fraction of a disk access. >


European Journal of Operational Research | 1992

An algorithm for storage device selection and file assignment

Bernard T. Han; George Diehr

Abstract Newly available write-once/read-many (WORM) optical storage devices provide the opportunity for storing massive amounts of data online at very low cost. However, the slow random access time of the WORM and its write-once limitation has, in general, restricted its application to archival storage and static files. In this paper we analyze a hybrid storage system which uses a combination of conventional magnetic disks and optical devices for management of files which have moderate volatility. The hybrid system appears to be particularly attractive for databases used for decision support. Cost models and solution algorithms are developed which determine a near-optimum database ‘storage plan’. A storage plan specifies the selection of device types and the assignment of files to devices. The solution approach uses a dynamic programming-based heuristic to obtain an initial solution followed by a set-covering algorithm which employs column generation to both search for an improved solution and to provide a tight lower bound. Computational results indicate that the method produces solutions with an average suboptimality of less than 1 percent. More importantly, this research demonstrates that a storage plan which uses a mix of device types can provide costs savings of up to 70% over a storage plan which is limited to conventional magnetic devices.


Information Systems Research | 1990

A Refresh Scheme for Remote Snapshots

Aditya N. Saharia; George Diehr

This article presents a scheme called “Difference Table” for maintaining database snapshots stored at sites remote from a central database which are refreshed only upon user request. Database snapshots are currently in widespread use where a subset of the central database is extracted and transmitted to a local workstation and utilized for decision support. The Difference Table method checks each update to a central database table against the definition of the snapshot. If the update is relevant, its effect is stored in a difference table. On receiving the refresh request, the contents of the difference table are transmitted to the remote site where they update the snapshot. The Difference Table scheme allows a selective refresh of the snapshot, in the sense that only the changes to a snapshot since the last refresh are transmitted. We discuss the additional database tables and processes required to support the Difference Table scheme. Performance measures are developed, and both quantitative and qualitative comparisons are made to alternative methods such as full regeneration and the approach used by System R * . By most criteria and in many environments, the Difference Table scheme is preferable to these alternatives. It also has several attractive side benefits which are not available in alternative methods.


European Journal of Operational Research | 1993

Optimal file management in a hybrid storage system

Ted Klastorin; Kamran Moinzadeh; George Diehr; Bernard T. Han

Abstract In this paper, we analyze a hybrid storage system for data management which combines conventional magnetic disks (MD) and write-once, read-many (OD) optical disks which offer high-density low-cost storage capability. Updates of the files stored on the ODs are temporarily stored in differential files on magnetic disks. Periodically, the differential file corresponding to each OD file is copied into free space on the OD to ‘refresh’ that file. After a time interval, the OD files and the most recent differential files are copied onto new optical media; this process is known as file reorganization. Using measures of data file volatility, file sizes, device costs, and costs for refreshing and reorganizing files, we show how this problem is similar to deterministic inventory problems and, using this analogy, develop a model which indicates how often files should be refreshed and reorganized.


Journal of Management Information Systems | 1990

Maintaining remote decision support databases

George Diehr; Aditya N. Saharia; David Chao

Abstract:This research describes and analyzes schemes for managing decision support databases that are extracted from a central database and “downloaded” to personal workstations. Unlike a (true) distributed database system, where updates are propagated to maintain consistency, these remote “snapshots” are updated only periodically (“refreshed”) upon command of the remote workstation user. This approach to data management has many of the same advantages of a distributed database over a centralized database (e.g., reduced communication costs, improved response time for retrievals, and reduction in contention), but it avoids the high overhead for concurrency control associated with updating in a distributed database. The added cost is in reduced data consistency.The schemes analyzed include full regeneration, the scheme used by System R*, and two new schemes. One new scheme—called modified regeneration—is a variation on simple full regeneration of the snapshot, but transmits only relevant changes to the sna...

Collaboration


Dive into the George Diehr's collaboration.

Top Co-Authors

Avatar

Aditya N. Saharia

University of Illinois at Chicago

View shared research outputs
Top Co-Authors

Avatar

Bernard T. Han

Western Michigan University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bruce Faaland

University of Washington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Earl Hunt

University of Washington

View shared research outputs
Top Co-Authors

Avatar

Jack S. Cook

State University of New York System

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ted Klastorin

University of Washington

View shared research outputs
Researchain Logo
Decentralizing Knowledge