Bettina Schnor
University of Potsdam
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bettina Schnor.
Future Generation Computer Systems | 2008
Andre Luckow; Bettina Schnor
Especially for sciences the provision of massive parallel CPU capacity is one of the most attractive features of a grid. A major challenge in a distributed, inherently dynamic grid is fault tolerance. The more resources and components involved, the more complicated and error-prone becomes the system. In a grid with potentially thousands of machines connected to each other the reliability of individual resources cannot be guaranteed. The benefit of the grid is that in case of a failure an application may be migrated and restarted from a checkpoint file on another site. This approach requires a service infrastructure which handles the necessary activities transparently. In this article, we present Migol, a fault-tolerant and self-healing grid middleware for MPI applications. Migol is based on open standards and extends the services of the Globus toolkit to support the fault tolerance of grid applications. Further, the Migol framework itself is designed with special focus on fault tolerance. For example, Migol replicates critical services and uses a ring-based replication protocol to achieve data consistency.
Philosophical Transactions of the Royal Society A | 2009
Andre Luckow; Shantenu Jha; Joohyun Kim; Andre Merzky; Bettina Schnor
Owing to the loose coupling between replicas, the replica–exchange (RE) class of algorithms should be able to benefit greatly from using as many resources as available. However, the ability to effectively use multiple distributed resources to reduce the time to completion remains a challenge at many levels. Additionally, an implementation of a pleasingly distributed algorithm such as replica–exchange, which is independent of infrastructural details, does not exist. This paper proposes an extensible and scalable framework based on Simple API for Grid Applications that provides a general-purpose, opportunistic mechanism to effectively use multiple resources in an infrastructure-independent way. By analysing the requirements of the RE algorithm and the challenges of implementing it on real production systems, we propose a new abstraction (BigJob), which forms the basis of the adaptive redistribution and effective scheduling of replicas.
International Conference on Intelligent Interactive Assistance and Mobile Multimedia Computing | 2009
Sebastian J. F. Fudickar; Bettina Schnor
In the aging sectors of societies in the western world, dementia and its characteristics such as disorientation and obliviousness are becoming a significant problem to an increasing number of people and health systems. In order to enable such dementia patients to regain a self-determined life, we have developed a mobile orientation system with a focus on minimal operational costs and a speech based human computer interface. This system assists dementia patients in everyday problems, such as remembering appointments and staying on track within their familiar surroundings as well as informing caretakers in critical situations.
pervasive technologies related to assistive environments | 2012
Sebastian J. F. Fudickar; Christian Karth; Philipp Mahr; Bettina Schnor
Mobile fall-detection systems that use accelerometers (as the ADXL 345) with data pre-processing capabilities, enable processors to remain longer in low power modes and therefore can achieve extended device lifetimes. Since fall-detection on these accelerometers is partially executed in hardware, the development and comparison of fall-detection algorithms requires direct evaluation on the hardware and increases complexity. We introduce a fall-detection simulator for the development and comparison of fall-detection algorithms for accelerometers with and without partial in-hardware pre-processing. In addition comprehensive records of fall-situations and daily living activities were generated for the simulator from recording movements. With the help of the simulator, the sensitivity of a given fall-detection algorithm could be improved from 33% to 93%.
network computing and applications | 2008
Andre Luckow; Bettina Schnor
A major challenge in a dynamic Grid with thousands of machines connected to each other is fault tolerance. The more resources and components involved, themore complicated and error-prone becomes the system. Migol is an adaptive Grid middleware, which addresses the fault tolerance of Grid applications and services by providing the capability to recover applications from checkpoint files automatically. A critical aspect for an automatic recovery is the availability of checkpoint files: If a resource becomes unavailable, it is very likely that the associated storage is also unreachable, e. g. due to a network partition. A strategy to increase the availability of checkpoints isreplication.In this paper, we present the Checkpoint Replication Service. A key feature of this service is the ability to automatically replicate and monitor checkpoints in the Grid.
international parallel and distributed processing symposium | 2008
Andre Luckow; Bettina Schnor
A major challenge in a service-oriented environment as a Grid is fault tolerance. The more resources and services involved, the more complicated and error-prone becomes the system. Migol (Luckow and Schnor, 2008) is a Grid middleware, which addresses the fault tolerance of Grid applications and services. Migols core component is its registry service called application information service (AIS). To achieve fault tolerance and high availability the AIS is replicated on different sites. Since a registry is a stateful Web service, the replication of the AIS is no trivial task. In this paper, we present our concept for active replication of Grid services. Migols Replication Service uses a token-based algorithm and certificate-based security to provide secure group communication. Further, we show in different experiments that active replication in a real Grid environment is feasible.
green computing and communications | 2010
Simon Kiertscher; Jörg Zinke; Stefan Gasterstädt; Bettina Schnor
This paper presents the design and implementation of an energy saving daemon for clusters called cherub. The design of the cherub daemon is modular and extensible. In the field of High Performance Computing (HPC) well known Resource Management Systems (RMSs) like Portable Batch System (PBS) [8]or its open source derivative, the TORQUE resource manager [11]are used to manage clusters and work queues. Thus, cherub is able to interact with different RMSs to make them energy aware. cherub is also suited for load-balancing clusters managed by dispatchers like Linux Virtual Server (LVS) [17] since the only requirement is a central approach for resource management.
ieee international conference on escience | 2008
Andre Luckow; Shantenu Jha; Joohyun Kim; Andre Merzky; Bettina Schnor
There exists a class of scientific applications for which utilizing distributed resources is critical for reducing the time-to-solution. In this paper, we discuss a specific class of applications - Replica-Exchange simulations - where the orchestration of many distributed jobs in a dynamic and inherently unreliable distributed environment is essential for a successful completion. We describe the design, development and deployment of a unique framework for constructing fault-tolerant distributed simulations. The framework consists of two primary components - SAGA and Migol. SAGA is a high-level programmatic abstraction layer that provides a standardised interface for the primary distributed functionality required for application development. We present details of a newly developed functionality in SAGA - the Checkpoint and Recovery (CPR) API. Migol is an adaptive middleware, which supports the fault-tolerance of distributed applications by providing the capability to recover applications from checkpoint files transparently. In addition to describing the integration of SAGA-CPR with the Migol infrastructure, we outline our experiences with running a large scale, general-purpose Replica-Exchange application in a production distributed environment.
local computer networks | 2002
Giuseppe Ciaccio; Marco Ehlert; Bettina Schnor
In this paper we report about the recently completed porting of GAMMA to the Netgear GA621 Gigabit Ethernet adapter, and provide a comparison among GAMMA, MPI/GAMMA, TCP/IP and MPICH/TCP, based on the Netgear GA621 and the older Netgear GA620 network adapters and using different device drivers, in a Gigabit Ethernet cluster of PC running Linux 2.4. GAMMA (the Genoa Active Message Machine) is a lightweight messaging system based on an active message-like paradigm, originally designed for efficient exploitation of Fast Ethernet interconnects. The comparison includes simple latency/bandwidth evaluation of the messaging systems on both adapters, as well as performance comparisons based on the NAS Parallel Benchmarks and an end-user fluid dynamics application called Modular Ocean Model (MOM). The analysis of results provides useful hints concerning the efficient use of Gigabit Ethernet with clusters of PC. In particular, it emerges that GAMMA on the GA621 adapter, with a combination of low end-to-end latency (8.5 /spl mu/s) and high throughput (118.4 MByte/s), provides a performing, cost-effective alternative to proprietary high-speed networks, e.g. Myrinet, for a wide range of cluster computing applications.
international conference on logic programming | 2011
Martin Gebser; Roland Kaminski; Benjamin Kaufmann; Torsten Schaub; Bettina Schnor
We report on three recent advances in the distributed ASP solver claspar. First, we describe its flexible architecture supporting various search strategies, including competitive search using a portfolio of solver configurations. Second, we describe claspars distributed learning capacities that allow for sharing learned nogoods among solver instances. Finally, we discuss claspars approach to distributed optimization.