Germán S. Goldszmidt

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Germán S. Goldszmidt is active.

Explore More

Publication

Featured researches published by Germán S. Goldszmidt.

integrated network management | 2001

Oceano-SLA based management of a computing utility

Karen Appleby; Sameh A. Fakhouri; Liana Fong; Germán S. Goldszmidt; Michael H. Kalantar; Srirama Mandyam Krishnakumar; Donald P. Pazel; John Arthur Pershing; Benny Rochwerger

Oceano is a prototype of a highly available, scaleable, and manageable infrastructure for an e-business computing utility. It enables multiple customers to be hosted on a collection of sequentially shared resources. The hosting environment is divided into secure domains, each supporting one customer. These domains are dynamic: the resources assigned to them may be augmented when the load increases and reduced when load dips. This dynamic resource allocation enables flexible service level agreements (SLAs) with customers in an environment where peak loads are an order of magnitude greater than the normal steady state.

international world wide web conferences | 1998

Network dispatcher: a connection router for scalable Internet services

Guerney D. H. Hunt; Germán S. Goldszmidt; Richard P. King; Rajat Mukherjee

Abstract Network Dispatcher (ND) is a TCP connection router that supports load sharing across several TCP servers. Prototypes of Network Dispatcher were used to support several large scale high-load Web sites. Network Dispatcher provides a fast IP packet-forwarding kernel-extension to the TCP IP stack. Load sharing is supported by a user-level manager process that monitors the load on the servers and controls the connection allocation algorithm in the kernel extension. This paper describes the design of Network Dispatcher, outlines Network Dispatchers performance in the context of http traffic, and presents several of its features including high-availability, support for WANs, and client affinity.

principles of distributed computing | 2001

On scalable and efficient distributed failure detectors

Indranil Gupta; Tushar Deepak Chandra; Germán S. Goldszmidt

Process groups in distributed applications and services rely on failure detectors to detect process failures completely, and as quickly, accurately, and scalably as possible, even in the face of unreliable message deliveries. In this paper, we look at quantifying the optimal scalability, in terms of network load, (in messages per second, with messages having a size limit) of distributed, complete failure detectors as a function of application-specified requirements. These requirements are 1) quick failure detection by some non-faulty process, and 2) accuracy of failure detection. We assume a crash-recovery (non-Byzantine) failure model, and a network model that is probabilistically unreliable (w.r.t. message deliveries and process failures). First, we characterize, under certain independence assumptions, the optimum worst-case network load imposed by any failure detector that achieves an applications requirements. We then discuss why traditional heart beating schemes are inherently unscalable according to the optimal load. We also present a randomized, distributed, failure detector algorithm that imposes an equal expected load per group member. This protocol satisfies the application defined constraints of completeness and accuracy, and speed of detection on an average. It imposes a network load that differs frown the optimal by a sub-optimality factor that is much lower than that for traditional distributed heartbeating schemes. Moreover, this sub-optimality factor does not vary with group size (for large groups).

network operations and management symposium | 1998

Load management for scaling up Internet services

Germán S. Goldszmidt

As the global Internet traffic increases, many popular sites are often unable to serve their TCP/IP workload, particularly during peak periods of activity. For example, Web servers for sports events are often swamped by requests during and after games. To address this problem, many sites allocate multiple server hosts to concurrently handle the incoming requests. To support workload sharing, they need a method to distribute the requests among the servers. Since network traffic is self-similar, with waves of heavy traffic at peak times, this requires dynamic feedback control. In this presentation we analyse several solutions to this scaling problem (client-based and DNS-based), and show some of their deficiencies. We then present our preferred method, which is based on IP packet forwarding and is transparent to both clients and servers. We implemented a TCP/IP load-management tool, NetDispatcher, that enables scalable, heterogeneous TCP/IP server clusters that can handle millions of TCP connections per hour. NetDispatcher does not perform any TCP/IP header translations. Hence, outgoing server-to-client packets need no processing and can follow a separate network route to the clients, resulting in improved bandwidth utilization and lower latency. Netdispatcher transparently handles server failures. NetDispatcher failures are handled by a shadow node without loosing active connection.

ACM Transactions on Computer Systems | 1990

High-level language debugging for concurrent programs

Germán S. Goldszmidt; Shaula Yemini; Shmuel Katz

An integrated system design for debugging distributed programs written in concurrent high-level languages is described. A variety of user-interface, monitoring, and analysis tools integrated around a uniform process model are provided. Because the tools are language-based, the user does not have to deal with low-level implementation details of distribution and concurrency, and instead can focus on the logic of the program in terms of language-level objects and constructs. The tools provide facilities for experimentation with process scheduling, environment simulation, and nondeterministic selections. Presentation and analysis of the programs behavior are supported by history replay, state queries, and assertion checking. Assertions are formulated in linear time temporal logic, which is a logic particularly well suited to specify the behavior of distributed programs. The tools are separated into two sets. The language-specific tools are those that directly interact with programs for monitoring of and on-line experimenting with distributed programs. The language-independent tools are those that support off-line presentation and analysis of the monitored information. This separation makes the system applicable to a wide range of programming languages. In addition, the separation of interactive experimentation from off-line analysis provides for efficient exploitation of both user time and machine resources. The implementation of a debugging facility for OCCAM is described.

Journal of Network and Systems Management | 2002

Yemanja—A Layered Fault Localization System for Multi-Domain Computing Utilities

Karen Appleby; Germán S. Goldszmidt; Malgorzata Steinder

Yemanja is a model-based event correlation engine for multi-layer fault diagnosis. It targets complex propagating fault scenarios, and can smoothly correlate low-level network events with high-level application performance alerts related to quality-of-service violations. Entity-models that represent devices or abstract components encapsulate their behavior. Distantly associated entity-models are not explicitly aware of each other, and communicate through internal event chains. Yemanjas state-based engine supports generic scenario definitions, prioritization of alternate solutions, integrated problem and device testing, and simultaneous analysis of overlapping problems. The system of correlation rules was developed based on the analysis of device and layer functions, and the dependencies among physical and abstract system components. The primary objectives of this research include the development of reusable, configuration independent, correlation scenarios, adaptability and extensibility of the engine to match the constantly changing topology of a multi-domain server farm, and development of a concise specification language that is relatively simple yet powerful.

international conference on computational logistics | 1992

High-level language support for programming distributed systems

Joshua S. Auerbach; David F. Bacon; Arthur P. Goldberg; Germán S. Goldszmidt; Ajei Sarat Gopal; Mark T. Kennedy; Andy Lowry; James R. Russell; William Silverman; Robert E. Strom; Daniel M. Yellin; Shaula Yemini

A strategy for simplifying the programming of heterogeneous distributed systems is presented. The approach used is based on integrating a high-level distributed programming model, the process model, directly into programming languages. Distributed applications written in such languages are portable across different environments, are shorter, and are simpler to develop than similar applications developed using conventional approaches. The process model is discussed, and Hermes and Concert/C, two languages that implement this model, are described. Hermes is a secure, representation-independent language designed explicitly around the process model. Concert/C is the C language augmented with a small set of extensions to support the process model while allowing reuse of existing C code. Hermes has been prototyped: an implementation of Concert/C is in development.<<ETX>>

cluster computing and the grid | 2002

Neptune: A Dynamic Resource Allocation and Planning System for a Cluster Computing Utility

Donald P. Pazel; Tamar Eilam; Liana L. Fong; Michael H. Kalantar; Karen Appleby; Germán S. Goldszmidt

We present Neptune - the resource director of Océano, a policy driven fabric management system that dynamically reconfigures resources in a computing utility cluster. Neptune implements an on-line control mechanism subject to policy-based performance and resource configuration objectives. Neptune reassigns servers and bandwidth among a set of service domains, based on pre-defined policy, in response to workload changes. It builds and executes a reconfiguration plan through a planning framework, breaking reconfiguration objectives into individual tasks delegated to set of lower level resource managers. We describe an example decision policy algorithm that we implemented and demonstrated in an 80 server multi-domain computing utility.

Journal of Parallel and Distributed Computing | 1991

The design of a stub generator for heterogeneous RPC systems

Yi-Hsiu Wei; Alexander D. Stoyenko; Germán S. Goldszmidt

Abstract A stub generator is a tool used to support distributed applications communicating via remote procedure calls. A stub generator generates stubs, which serve as local agents acting on behalf of remote callers and callees. We describe a novel stub generator design which supports RPCs between programs written in different programming languages and running on machines of different architectures. The stub generation includes determining procedure interface compatibility and determining whether format representation conversion is possible. Both direct and indirect conversion schemes are supported. Stubs are constructed from language-dependent templates that capture the essential syntactic and semantic structure of data types in the supported language. The stub generator itself is language independent. Marshaling and unmarshaling of scalar and composite data types, including recursive or pointer-based data types, are supported.

foundations of computer science | 2001

Gulfstream - a system for dynamic topology management in multi-domain server farms

S.A. Fakhouri; Germán S. Goldszmidt; Indranil Gupta

This paper describes GulfStream, a scalable distributed software system designed to address the problem of managing the network topology in a multi-domain server farm. In particular, it addresses the following core problems: topology discovery and verification, and failure detection. Un-like most topology discovery and failure detection systems which focus on the nodes in a cluster, GulfStream logically organizes the network adapters of the server farm into groups. Each group contains those adapters that can directly exchange messages. GulfStream dynamically establishes a hierarchy for reporting network topology and availability of network adapters. We describe a prototype implementation of GulfStream on a 55 node heterogeneous server farm interconnected using switched fast Ethernet.

Explore More