Peter Sobe | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Peter Sobe is active.

Explore More

Publication

Featured researches published by Peter Sobe.

international parallel and distributed processing symposium | 2003

Stable checkpointing in distributed systems without shared disks

Peter Sobe

Interacting processes an distributed systems save their checkpoints on local disks for efficiency reasons. But, because local checkpoints get unavailable with failing hosts, redundancy schemes similar to RAID-like storage schemes have to be used. In such systems, checkpoints are stable under a particular fault model because they can get reconstructed in the distributed system. In this paper, two variants of stable checkpoint storage are compared, (a) parity grouping over local checkpoints and (ii) RAID-like distribution of each checkpoint using a software based distributed storage system. An analysis is given to compare costs for collective checkpoint creation, recovery of a single process and rollback of all processes. The results show that despite the differences in detail, checkpointing using a distributed storage system is a reasonable solution.

storage network architecture and parallel i/os | 2003

Data consistent up- and downstreaming in a distributed storage system

Peter Sobe

Distribution of large data objects among several storage servers is a common technique to speed up access rates. In combination with parity schemes, failures of single server nodes can be tolerated, so that such systems reach a certain degree of fault tolerance. In this paper such a distributed server system is analyzed. Data objects are stored in a data layout according to RAID level 3 among disk subsystems of different computers. An access control provides concurrent up- and down-streaming of data objects to/from the distributed storage system with ensured data consistency. This consistency control is described in combination with the handling of faulty server nodes and faulty clients. Furthermore, performance is measured with several access patterns. An application of that technique is for instance a distributed video server, allowing permanently updates without interrupting access.

Microprocessors and Microsystems | 2008

Experiences with a FPGA-based Reed/Solomon-encoding coprocessor

Volker Hampel; Peter Sobe; Erik Maehle

In this paper we present an implementation of a Reed/Solomon (R/S)-coprocessor to be used on a hybrid computing system, which combines general purpose CPUs with FPGAs. The coprocessor accelerates the encoding of user data to be stored block-wise on a distributed, failure-tolerant storage system. We document design constraints and their impact on the resulting architecture. Measurements are presented to characterize the performance of the coprocessor in terms of computational bandwidth, latency, and the hardware-software interaction. For comparison, software-based R/S-encoding implementations are presented and evaluated as well. The two variants of the FPGA-based coprocessors are compared to each other with respect to their fitting to a distributed storage application.

network computing and applications | 2006

Comparison of Redundancy Schemes for Distributed Storage Systems

Peter Sobe; Kathrin Peter

Reliable distributed data storage systems have to employ redundancy codes to tolerate the loss of storages. Many appropriate codes and algorithms can be found in the literature, but efficient schemes for tolerating several storage failures and their embedding in a distributed system are still research issues. In this paper, a variety of redundancy schemes are compared that got implemented in a distributed storage system. All schemes are based on parity and Reed/Solomon and are integrated in the storage system NetRAID. This system allows to configure several user-specified layouts. A performance and reliability analysis of several data and redundancy layouts is presented that combines analytical and experimental results. In a detail, we present performance results for an optimized Reed/Solomon implementation and give an outline for speeding up encoding and recovery by reconfigurable hardware employed in the distributed storage system

storage network architecture and parallel i os | 2010

Parallel Reed/Solomon Coding on Multicore Processors

Peter Sobe

Cauchy Reed/Solomon is an XOR-based erasure-tolerant coding scheme, applied for reliable distributed storage, fault-tolerant memory and reconstruction of content from widely distributed data. The encoding and decoding is based on XOR operations and already well supported by microprocessors.On multicore processors, the coding procedures should also exploit parallelism to speed up coding. In this paper we derive coding procedures from code parameters (e.g. the number of tolerated failures) and propose their transformation into parallel coding schedules that are mapped on multicore processors. We (i) compare functionally decomposed coding procedures with data-parallel coding of different blocks, and (ii) specify the method to derive these schedules.

network computing and applications | 2008

Flexible Parameterization of XOR based Codes for Distributed Storage

Peter Sobe; Kathrin Peter

Distributed storage systems apply erasure-tolerant codes to guarantee reliable access to data despite failures of storage resources. While many codes can be mapped to XOR operations and efficiently implemented on common microprocessors, only a certain number of codes are usually implemented in a certain system (out of a wide variety of different codes). The ability to include new codes easily, to exchange codes and finally to select codes for several types of data is desirable. To provide this flexibility, a parameterization is used which allows the definition of different XOR based codes, and beyond different styles of en- and decoding. The parameters include (i) the assignment of data and redundancy elements to the storage resources and (ii) a description of en- and decoding algorithms with XOR based equations. The parameters of a certain code can be changed and in addition a wide variety of codes can be described and included in a storage system implementation. The proposed parameterization adopts the ability of codes like EVEN- ODD, Cauchy-R/S and Hover codes to map to distributed resources. Furthermore, en- and decoding algorithms can be described differently, either for minimal coding cost or for minimal coding time on parallel systems.

automation, robotics and control systems | 2007

FPGA-accelerated deletion-tolerant coding for reliable distributed storage

Peter Sobe; Volker Hampel

Distributed storage systems often have to guarantee data availability despite of failures or temporal downtimes of storage nodes. For this purpose, a deletion-tolerant code is applied that allows to reconstruct missing parts in a codeword, i.e. to tolerate a distinct number of failures. The Reed/Solomon (R/S) code is the most general deletion-tolerant code and can be adapted to a required number of tolerable failures. In terms of its least information overhead, R/S is optimal, but it consumes significantly more computation power than parity-based codes. Reconfigurable hardware can be employed for particular operations in finite fields for R/S coding by specialized arithmetics, so that the higher computation effort is compensated by faster and parallel operations. We present architectures for an application-specific acceleration by FPGAs. In this paper, strategies for an efficient communication with the accelerating FPGA and a performance comparison between a pure software-based solution and the accelerated system are provided.

Proceedings IEEE International Computer Performance and Dependability Symposium. IPDS 2000 | 2000

Reaching efficient fault-tolerance for cooperative applications

Peter Sobe

Cooperative applications are widely used, e.g. as parallel calculations or distributed information processing systems. Whereby such applications meet the users demand and offer a performance improvement, the susceptibility to faults of any used computer node is raised. Often a single fault may cause a complete application failure. On the other hand, the redundancy in distributed systems can be utilized for fast fault detection and recovery. So, we followed an approach that is based an duplication of each application process to detect crashes and faulty functions of single computer nodes. We concentrate on two aspects of efficient fault-tolerance-fast fault detection and recovery without delaying the application progress significantly. The contribution of this work is first a new fault detecting protocol for duplicated processes. Secondly, we enhance a roll forward recovery scheme so that it is applicable to a set of cooperative processes in conformity to the protocol.

international parallel and distributed processing symposium | 2001

Concurrent updates on striped data streams in clustered server systems

Peter Sobe

Data Striping is widely used and well-understood in Redundant Arrays of Independent Disks (RAID), whereby an array of disks is controlled by a single device controller. By striping and adding parity information, distribution of I/O load and fault tolerance can be assured. This study focuses the problem of update/read consistency of files that are striped among several nodes, e.g. in a clustered server for media streaming applications. Common RAID systems offer a single entry point so that a read operation delivers either the content before an update or the updated content. This property is not present when arbitrary nodes may access data stripes without a centralized access facility. Thus, coordination is necessary among updates and concurrent read operations. A common solution is to lock files or blocks that are currently being updated. We propose an alternative solution to ensure delivery of valid content during updates without locking blocks globally. This can be used for instance to update the original content of video servers, without the need of temporary copies and access limitations.

international parallel and distributed processing symposium | 2004

Reconfiguration of RAID-like data layouts in distributed storage systems

Peter Sobe

Summary form only given. Clustering of several storage servers is a common way to build fast and fault tolerant storage systems. One application can be found in the context of parallel programs that already run on clustered systems and need to write and read a huge amount of data from and to disks. Another application field are Web and video streaming server that cause intense data transfer from and to disks. A distributed storage system is reviewed under the aspect of fault tolerance and reconfiguration of the data layout after faults. Data objects are stored in a data layout according to RAID level 3 among disk subsystems of different computers. Concurrent up-and down-streaming of data is provided by a technique that ensures data consistency. This consistency has been found to be beneficial for concurrent access and reconfiguration. Beyond, the system does not need a meta-data server, which often represents a bottleneck for distributed storage systems.

Explore More