Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Eugen Schenfeld is active.

Publication


Featured researches published by Eugen Schenfeld.


conference on high performance computing (supercomputing) | 2005

On the Feasibility of Optical Circuit Switching for High Performance Computing Systems

Kevin J. Barker; Alan F. Benner; Raymond R. Hoare; Adolfy Hoisie; Darren J. Kerbyson; Dan Li; Rami G. Melhem; Ramakrishnan Rajamony; Eugen Schenfeld; Shuyi Shao; Craig B. Stunkel; Peter A. Walker

The interconnect plays a key role in both the cost and performance of large-scale HPC systems. The cost of future high-bandwidth electronic interconnects mushrooms due to expensive optical transceivers needed between electronic switches. We describe a potentially cheaper and more power-efficient approach to building high-performance interconnects. Through empirical analysis of HPC applications, we find that the bulk of inter-processor communication (barring collectives) is bounded in degree and changes very slowly or never. Thus we propose a two-network interconnect: An Optical Circuit Switching (OCS) network handling long-lived bulk data transfers, using optical switches; and a secondary lower-bandwidth Electronic Packet Switching (EPS) network. An OCS could be significantly cheaper, as it uses fewer optical transceivers than an electronic network. Collectives and transient communication packets traverse the electronic network. We present compiler techniques and dynamic run-time policies, using this two-network interconnect. Simulation results show that our approach provides high performance at low cost.


international parallel and distributed processing symposium | 2003

System management in the BlueGene/L supercomputer

George S. Almasi; Leonardo R. Bachega; Ralph Bellofatto; José R. Brunheroto; Calin Cascaval; José G. Castaños; Paul G. Crumley; C. Christopher Erway; Joseph Gagliano; Derek Lieber; Pedro Mindlin; José E. Moreira; Ramendra K. Sahoo; Alda Sanomiya; Eugen Schenfeld; Richard A. Swetz; Myung M. Bae; Gregory D. Laib; Kavitha Ranganathan; Yariv Aridor; Tamar Domany; Y. Gal; Oleg Goldshmidt; Edi Shmueli

The BlueGene/L supercomputer will use system-on-a-chip integration and a highly scalable cellular architecture to deliver 360 teraflops of peak computing power. With 65536 compute nodes, BlueGene/L represents a new level of scalability for parallel systems. As such, it is natural for many scalability challenges to arise. In this paper, we discuss system management and control, including machine booting, software installation, user account management, system monitoring, and job execution. We address the issue of scalability by organizing the system hierarchically. The 65536 compute nodes are organized in 1024 clusters of 64 compute nodes each, called processing sets. Each processing set is under control of a 65th node, called an I/O node. The 1024 processing sets can then be managed to a great extent as a regular Linux cluster, of which there are several successful examples. Regular cluster management is complemented by BlueGene/L specific services, performed by a service node over a separate control network. Our software development and experiments have been conducted so far using an architecturally accurate simulator of BlueGene/L, and we are gearing up to test real prototypes in 2003.


Journal of Parallel and Distributed Computing | 2013

Generating synthetic task graphs for simulating stream computing systems

Deepak Ajwani; Shoukat Ali; Kostas Katrinis; Cheng-Hong Li; Alfred Park; John P. Morrison; Eugen Schenfeld

Stream-computing is an emerging computational model for performing complex operations on and across multi-source, high-volume data flows. The pool of mature publicly available applications employing this model is fairly small, and therefore the availability of workloads for various types of applications is scarce. Thus, there is a need for synthetic generation of large-scale workloads to drive simulations and estimate the performance of stream-computing applications at scale. We identify the key properties shared by most task graphs of stream-computing applications and use them to extend known random graph generation concepts with stream computing specific features, providing researchers with realistic input stream graphs. Our graph generation techniques serve the purpose of covering a disparity of potential applications and user input. Our first domain-specific framework exhibits high user-controlled configurability while the second application-agnostic framework focuses solely on emulating the key properties of general stream-computing systems, at the loss of domain-specific fine-tuning.


computing frontiers | 2012

Adaptive task duplication using on-line bottleneck detection for streaming applications

Yoonseo Choi; Cheng-Hong Li; Dilma Da Silva; Alan Bivens; Eugen Schenfeld

In this paper we describe an approach to dynamically improve the progress of streaming applications on SMP multi-core systems. We show that run-time task duplication is an effective method for maximizing application throughput in face of changes in available computing resources. Such changes can not be fully handled by static optimizations. We derive a theoretical performance model to identify tasks in need of more computing resources. We propose two on-line algorithms that use indications from the performance model to detect computation bottlenecks. In these algorithms, a task can identify itself as a bottleneck using only its local data. The proposed technique is transparent to end programmers and portable to systems with fair scheduling. Our on-line detection algorithms can be applied to other dynamic scenarios, for example, involving run-time variation of workload.n Our experiments using the StreamIt benchmarks [5] show that the proposed run-time task duplication achieves considerable speedups over the multi-threaded baseline on a 16-core machine and on the scenarios with dynamically changing number of processing cores. We also show that our algorithms achieve better application throughput than alternative approaches for task duplication.


international parallel and distributed processing symposium | 2012

Switching Optically-Connected Memories in a Large-Scale System

Abhirup Chakraborty; Eugen Schenfeld; Dilma Da Silva

Recent trends in processor and memory systems in large-scale computing systems reveal a new memory wall that prompts investigation on alternate main memory organization separating main memory from processors and arranging them in separate ensembles. In this paper, we study the feasibility of transferring data across processors by using the optical interconnection fabric that acts as a bridge between processor and memory ensembles. We propose a memory switching protocol that transfers data across processors without physically moving the data across electrical switches. Such a mechanism allows large-scale data communication across processors through transfer of a few tiny blocks of meta-data. We present detailed techniques for supporting two communication patterns prevalent in any large-scale scientific and data management applications. We present experimental results analyzing the feasibility of memory switching in a wide range of applications, and characterize applications based on the impact of the memory switching on their performance.


modeling, analysis, and simulation on computer and telecommunication systems | 2011

Analytical Performance Modeling for Null Message-Based Parallel Discrete Event Simulation

Cheng-Hong Li; Alfred Park; Eugen Schenfeld

This paper presents a new analytical performance analysis for null message-based parallel discrete event simulation(PDES). Our analysis builds upon the key operation of selecting simulation events for processing in the null message algorithms. The results not only explain the well-known facts of how the look ahead capability of individual simulation processes (called logical processes, or LPs) affect the simulation performance, but also reveals quantitatively how the look ahead, the communication topology, the computation and communication delays, and the flow control mechanism affect the simulation performance. We first show that all of the LPsin a strongly connected component in the communication topology asymptotically progress at the same speed, regardless of their individual characteristics and their share of computation resource. Second, we derive an analytical upper bound on the simulation performance. The derivation shows that the ratio between the sum of the look ahead and the sum of the event processing and communication delays of LPs in a cycle bounds the simulation speed from the above, and the cycle of LPs imposes the tightest upper bound becomes the bottleneck of the simulation. We conduct a series of simulation experiments to empirically validate our findings. Moreover, we show that by using the derived upper bound as an optimization guidance, we improve the partitioning of a simple parallel simulation example and achieve a four times speedup against the same simulation based on a classic min-cut partitioning strategy.


modeling, analysis, and simulation on computer and telecommunication systems | 2011

A Flexible Workload Generator for Simulating Stream Computing Systems

Deepak Ajwani; Shoukat Ali; Kostas Katrinis; Cheng-Hong Li; Alfred Park; John P. Morrison; Eugen Schenfeld

Stream computing is an emerging computational model for performing complex operations on and across multi-source, high volume data ?ows. Given that the deployment of the model has only started, the pool of mature applications employing this model is fairly small, and therefore the availability of workloads for various types of applications is scarce. Thus, there is a need for synthetic generation of large-scale workloads for evaluation of stream computing applications at scale. This paper presents a framework for producing synthetic workloads for stream computing systems. Our framework extends known random graph generation concepts with stream computing spe-cific features, providing researchers with realistic input stream graphs and allowing them to focus on system development, optimization and analysis. Serving the goal of covering a disparity of potential applications, the presented framework exhibits high user-controlled configurability. The produced workloads could be used to drive simulations for performance evaluation and for proof-of-concept prototyping of processing, networking and operating system hardware and software.


Simulation | 2012

Towards flexible exascale stream processing system simulation

Alfred Park; Cheng-Hong Li; Ravi Nair; Nobuyuki Ohba; Uzi Shvadron; Ayal Zaks; Eugen Schenfeld

Stream processing is an important emerging computational model for performing complex operations on and across multi-source, high-volume, unpredictable dataflows. We present Flow, a platform for parallel and distributed stream processing system simulation that provides a flexible modeling environment for analyzing stream processing applications. The Flow stream processing system simulator is a high-performance, scalable simulator that automatically parallelizes chunks of the model space and incurs near-zero synchronization overhead for acyclic stream application graphs. We show promising parallel and distributed event rates exceeding 149 million events per second on a cluster with 512 processor cores.


workshop on parallel and distributed simulation | 2010

Flow: A Stream Processing System Simulator

Alfred Park; Cheng-Hong Li; Ravi Nair; Nobuyuki Ohba; Uzi Shvadron; Ayal Zaks; Eugen Schenfeld

Stream processing is an important emerging computational model for performing complex operations on and across multi-source, high volume, unpredictable dataflows. We present Flow, a platform for parallel and distributed stream processing system simulation that provides a flexible modeling environment for analyzing stream processing applications. The Flow stream processing system simulator is a high performance, scalable simulator that automatically parallelizes chunks of the model space and incurs near zero synchronization overhead for stream application graphs that exhibit feed-forward behavior. We show promising multi-threaded and multi-process event rates exceeding 80 million events per second on a cluster with 256 processor cores.


international conference on cluster computing | 2002

Blue Gene/L, a system-on-a-chip

George S. Almasi; G.S. Almasi; D. Beece; Ralph Bellofatto; G. Bhanot; R. Bickford; M. Blumrich; Arthur A. Bright; José R. Brunheroto; Cǎlin Caşcaval; José G. Castaños; Luis Ceze; R. Coteus; S. Chatterjee; D. Chen; G. Chiu; T.M. Cipolla; Paul G. Crumley; A. Deutsch; M.B. Dombrowa; W. Donath; M. Eleftheriou; B. Fitch; Joseph Gagliano; Alan Gara; R. Germain; M.E. Giampapa; Manish Gupta; F. Gustavson; S. Hall

Summary form only given. Large powerful networks coupled to state-of-the-art processors have traditionally dominated supercomputing. As technology advances, this approach is likely to be challenged by a more cost-effective System-On-A-Chip approach, with higher levels of system integration. The scalability of applications to architectures with tens to hundreds of thousands of processors is critical to the success of this approach. Significant progress has been made in mapping numerous compute-intensive applications, many of them grand challenges, to parallel architectures. Applications hoping to efficiently execute on future supercomputers of any architecture must be coded in a manner consistent with an enormous degree of parallelism. The BG/L program is developing a peak nominal 180 TFLOPS (360 TFLOPS for some applications) supercomputer to serve a broad range of science applications. BG/L generalizes QCDOC, the first System-On-A-Chip supercomputer that is expected in 2003. BG/L consists of 65,536 nodes, and contains five integrated networks: a 3D torus, a combining tree, a Gb Ethernet network, barrier/global interrupt network and JTAG.

Researchain Logo
Decentralizing Knowledge