Paul M. Dantzig | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paul M. Dantzig is active.

Explore More

Publication

Featured researches published by Paul M. Dantzig.

international conference on computer communications | 1999

A scalable system for consistently caching dynamic Web data

Jim Challenger; Arun Iyengar; Paul M. Dantzig

This paper presents a new approach for consistently caching dynamic Web data in order to improve performance. Our algorithm, which we call data update propagation (DUP), maintains data dependence information between cached objects and the underlying data which affect their values in a graph. When the system becomes aware of a change to underlying data, graph traversal algorithms are applied to determine which cached objects are affected by the change. Cached objects which are found to be highly obsolete are then either invalidated or updated. The DUP was a critical component at the official Web site for the 1998 Olympic Winter Games. By using DUP, we were able to achieve cache hit rates close to 100% compared with 80% for an earlier version of our system which did not employ DUP. As a result of the high cache hit rates, the Olympic Games Web site was able to serve data quickly even during peak request periods.

IEEE Internet Computing | 2000

High performance Web site design techniques

Arun Iyengar; Jim Challenger; Daniel M. Dias; Paul M. Dantzig

This article presents techniques for designing Web sites that need to handle large request volumes and provide high availability. The authors present new techniques they developed for keeping cached dynamic data current and synchronizing caches with underlying databases. Many of these techniques were deployed at the official Web site for the 1998 Olympic Winter Games.

conference on high performance computing (supercomputing) | 1998

A Scalable and Highly Available System for Serving Dynamic Data at Frequently Accessed Web Sites

Jim Challenger; Paul M. Dantzig; Arun Iyengar

This paper describes the system and key techniques used for achieving performance and high availability at the official Web site for the 1998 Olympic Winter Games which was one of the most popular Web sites for the duration of the Olympic Games. The Web site utilized thirteen SP2 systems scattered around the globe containing a total of 143 processors. A key feature of the Web site was that the data being presented to clients was constantly changing. Whenever new results were entered into the system, updated Web pages reflecting the changes were made available to the rest of the world within seconds. One technique we used to serve dynamic data efficiently to clients was to cache dynamic pages so that they only had to be generated once. We developed and implemented a new algorithm we call Data Update Propagation (DUP) which identifies the cached pages that have become stale as a result of changes to underlying data on which the cached pages depend, such as databases. For the Olympic Games Web site, we were able to update stale pages directly in the cache which obviated the need to invalidate them. This allowed us to achieve cache hit rates of close to 100%. Our system was able to serve pages to clients quickly during the entire Olympic Games even during peak periods. In addition, the site was available 100% of the time. We describe the keyfeatures employed by our site for high availability. We also describe how the Web site was structured to provide useful information while requiring clients to examine only a small number of pages.

IEEE ACM Transactions on Networking | 2004

Efficiently serving dynamic data at highly accessed web sites

James R. H. Challenger; Paul M. Dantzig; Arun Iyengar; Mark S. Squillante; Li Zhang

We present architectures and algorithms for efficiently serving dynamic data at highly accessed Web sites together with the results of an analysis motivating our design and quantifying its performance benefits. This includes algorithms for keeping cached data consistent so that dynamic pages can be cached at the Web server and dynamic content can be served at the performance level of static content. We show that our system design is able to achieve cache hit ratios close to 100% for cached data which is almost never obsolete by more than a few seconds, if at all. Our architectures and algorithms provide more than an order of magnitude improvement in performance using an order of magnitude fewer servers over that obtained under conventional methods.

ACM Transactions on Internet Technology | 2005

A fragment-based approach for efficiently creating dynamic web content

Jim Challenger; Paul M. Dantzig; Arun Iyengar; Karen Witting

This article presents a publishing system for efficiently creating dynamic Web content. Complex Web pages are constructed from simpler fragments. Fragments may recursively embed other fragments. Relationships between Web pages and fragments are represented by object dependence graphs. We present algorithms for efficiently detecting and updating Web pages affected after one or more fragments change. We also present algorithms for publishing sets of Web pages consistently; different algorithms are used depending upon the consistency requirements.Our publishing system provides an easy method for Web site designers to specify and modify inclusion relationships among Web pages and fragments. Users can update content on multiple Web pages by modifying a template. The system then automatically updates all Web pages affected by the change. Our system accommodates both content that must be proofread before publication and is typically from humans as well as content that has to be published immediately and is typically from automated feeds.We discuss some of our experiences with real deployments of our system as well as its performance. We also quantitatively present characteristics of fragments used at a major deployment of our publishing system including fragment sizes, update frequencies, and inclusion relationships.

Lecture Notes in Computer Science | 2001

Engineering Highly Accessed Web Sites for Performance

Jim Challenger; Arun Iyengar; Paul M. Dantzig; Daniel M. Dias; Nathaniel Mills

This paper describes techniques for improving performance at Web sites which receive significant traffic. Poor performance can be caused by dynamic data, insufficient network bandwidth, and poor Web page design. Dynamic data overheads can often be reduced by caching dynamic pages and using fast interfaces to invoke server programs. Web server acceleration can significantly improve performance and reduce the hardware needed at a Web site. We discuss techniques for balancing load among multiple servers at a Web site. We also show how Web pages can be designed to minimize traffic to the site.

global communications conference | 2002

Analysis of measurement data from sporting event Web sites

Zhen Liu; Mark S. Squillante; Cathy H. Xia; Shun-Zheng Yu; Li Zhang; Naceur Malouch; Paul M. Dantzig

With the growing popularity of Web applications, there is a considerable increase in the importance of managing Web sites to deliver high levels of performance and scalability to accommodate future growth and evolution. One of the key issues in this regard concerns a better understanding of the traffic patterns at multiple levels, such as the levels of requests, pages and sessions. This paper presents a detailed analysis of measurement data from various sources pertaining to a specific multi-tiered, geographically distributed architecture that has been used to host the Web sites for a number of recent, popular sporting events. Our analysis of the request-level and page-level patterns demonstrate differences among the Web sites depending upon the type of event and the breadth of interests of the user community. Some of these patterns are consistent with commercial Web sites, while others are significantly different These results further illustrate geographical differences in the request-level and page-level patterns. Our analysis also investigates in detail session-level characteristics. This includes an analysis of the session durations, the think time distributions, the dependence structure of the session arrival process, and the page views comprising each session.

international conference on parallel and distributed systems | 1996

A distributed connection manager interface for web services on IBM SP systems

Yew-Huey Liu; Paul M. Dantzig; Ching-Farn Eric Wu; Lionel M. Ni

In essence, the World Wide Web is a worldwide string of computer databases using a common information retrieval architecture. With the increasing popularity of the World Wide Web, more and more functions have been added to retrieve not only documents written in HTML (Hypertext Markup Language), but also those in other forms through the Common Gateway Interface (CGI), by constructing HTML documents dynamically. Dynamic construction of HTML documents for handling information such as digital libraries is slow and requires much more computer power. A significant performance bottleneck is the initialization and setup phase for a CGI process to gain access to the system containing the data. In this paper we describe the design and implementation of a Connection Manager Interface on IBM SP systems. The Connection Manager provides cliette processes to serve CGI requests and eliminates such bottlenecks. An IBM SP system is used for this emerging area to show that our design and implementation is flexible enough to take advantage of the High-Performance Switch in an IBM SP system. We trace and monitor this scalable Web services using UTE (Unified Trace Environment) tools, and present its performance analysis and visualization.

european conference on research and advanced technology for digital libraries | 1998

Visualizing Document Classification: A Search Aid for the Digital Library

Yew-Huey Liu; Paul M. Dantzig; Martin William Sachs; James T. Corey; Mark T. Hinnebusch; Marc Damashek; Jonathan D. Cohen

The recent explosion of the internet has made digital libraries popular. The user-friendly interface of Web browsers allows a user much easier access to the digital library. However, to retrieve relevant documents from the digital library, the user is provided with a search interface consisting of one input field and one push button. Most users type in a single keyword, click the button, and hope for the best. The result of a query using this kind of search interface can consist of a large unordered set of documents, or a ranked list of documents based on the frequency of the keywords. Both lists can contain articles unrelated to users inquiry unless a sophisticated search was performed and the user knows exactly what to look for. More sophisticated algorithms for ranking the relevance of search results may help, but what is desperately needed are software tools that can analyze the search result and manipulate large hierarchies of data graphically. In this paper, we present a language-independent document classification system for the Florida Center for Library Automation to help users analyze the search query results. Easy access through the Web is provided, as well as a graphical user interface to display the classification results.

software engineering and knowledge engineering | 2002

Architecture and design of high volume web sites: (a brief history of IBM sport and event web sites)

Paul M. Dantzig

Architecting and designing high volume Web sites has changed immensely over the last six years. These changes include the availability of inexpensive Pentium based servers, Linux, Java applications, commodity switches, connection management and caching engines, bandwidth price reductions, content distribution services, and many others. This paper describes the evolution of the best practices within IBM in architecting sites that handle millions of page views per day. Discussed is the transition to multi-tiered architectures, the use of publish/subscribe software to reduce Web site hits, the migration from static to dynamic content, and techniques for caching of dynamic and personalized content.

Explore More