Thomas Jejkal
Karlsruhe Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thomas Jejkal.
ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011
Ariel Garcia; S. Bourov; Ahmad Hammad; Jos van Wezel; Bernhard Neumair; Achim Streit; Volker Hartmann; Thomas Jejkal; Patrick Neuberger; Rainer Stotzka
The Large Scale Data Facility (LSDF) at the Karlsruhe Institute of Technology was started end of 2009 with the aim of supporting the growing requirements of data intensive experiments. In close cooperation with the involved scientific communities, the LSDF provides them not only with adequate storage space but with a directly attached analysis farm and -- more importantly -- with value added services for their big scientific data-sets. Analysis workflows are supported through the mixed Hadoop and Open Nebula Cloud environments directly attached to the storage, and enable the efficient processing of the experimental data. Metadata handling is a central part of the LSDF, where a metadata repository, community specific metadata schemes, graphical tools, and APIs were developed for accessing and efficiently organizing the stored data-sets.
parallel, distributed and network-based processing | 2011
Rainer Stotzka; Volker Hartmann; Thomas Jejkal; Michael Sutter; Jos van Wezel; Marcus Hardt; Ariel Garcia; Rainer Kupsch; S. Bourov
To cope with the growing requirements of data intensive scientific experiments, models and simulations the Large Scale Data Facility(LSDF) at KIT aims to support many scientific disciplines. The LSDFis a distributed storage facility at Exabyte scale providing storage, archives, data bases and meta data repositories. Open interfaces and APIs support a variety of access methods to the highly available services for high throughput data applications. Tools for an easy and transparent access allow scientists to use the LSDF without bothering with the internal structures and technologies. In close cooperation with the scientific communities the LSDF provides assistance to efficiently organize data and metadata structures, and develops and deploys community specific software on the directly connected computing infrastructure.
parallel, distributed and network-based processing | 2012
Thomas Jejkal; Volker Hartmann; Rainer Stotzka; Jens C. Otte; Ariel Garcia; Jos van Wezel; Achim Streit
To cope with the growing requirements of data intensive scientific experiments, models and simulations the Large Scale Data Facility (LSDF) at KIT aims to support many scientific disciplines. The LSDF is a distributed storage facility at Exabyte scale providing storage, archives, data bases and meta data repositories. Apart from data storage many scientific communities need to perform data processing operations as well. For this purpose the LSDF Execution Framework for Data Intensive Applications (LAMBDA) was developed to allow asynchronous high-performance data processing next to the LSDF. However, it is not restricted to the LSDF or any special feature only available at the LSDF. The main goal of LAMBDA is to simplify large scale data processing for scientific users by reducing complexity, responsibility and error-proneness. The description of an execution is realized as part of LAMBDA administration in the background via meta data that can be obtained from arbitrary sources. Thus, the scientific user has only to select which applications he wants to apply to his data.
ieee symposium on large data analysis and visualization | 2011
Ariel Garcia; S. Bourov; Ahmad Hammad; Volker Hartmann; Thomas Jejkal; Jens C. Otte; S. Pfeiffer; T. Schenker; C. Schmidt; P. Neuberger; Rainer Stotzka; J. van Wezel; Bernhard Neumair; Achim Streit
The Large Scale Data Facility (LSDF) was conceived and launched at the Karlsruhe Institute of Technology (KIT) end of 2009 to address the growing need for value-added storage services for data intensive experiments. The LSDF main focus is to support scientific experiments producing large data sets reaching into the petabyte range with adequate storage, support and value added services for data management, processing and preservation. In this work we describe the approach taken to perform data analysis in LSDF, as well as for data management of the scientific datasets.
international conference on big data | 2015
Ajinkya Prabhune; Rainer Stotzka; Thomas Jejkal; Volker Hartmann; Margund Bach; Eberhard Schmitt; Michael Hausmann; Juergen Hesser
Exponential growth in scientific research data demands novel measures for managing the extremely large datasets. In particular, due to advancements in high-resolution microscopy, the nanoscopy scientific research community is producing datasets up to the range of multiple TeraBytes (TB). Systematically acquired datasets of biological specimens are composed of multiple high-resolution images, in the range of 150-200 TB. The management of these extremely large datasets requires an optimized Generic Client Service (GCS) API with an integration into a data repository system. The novel API proposed in this paper provides an abstract interface that connects various disparate systems. The API is optimized to provide an efficient and automated ingest, download of the data and management of its metadata. The ingest and download processes are based on well-defined workflows stated in this paper. The base metadata model for comprehensive description of the datasets is also stated in the paper. The API is seamlessly integrated with a digital data repository system, namely KIT Data Manager to make it adaptable for a wide range of communities. Finally, a simple and easy to use command line tool is realized based on GCS API to manage large datasets of nanoscopy research community.
international conference on digital information management | 2011
Ariel Garcia; S. Bourov; Ahmad Hammad; Thomas Jejkal; Jens C. Otte; S. Pfeiffer; T. Schenker; C. Schmidt; J. van Wezel; Bernhard Neumair; Achim Streit
The Large Scale Data Facility (LSDF) was started at the Karlsruhe Institute of Technology (KIT) end of 2009 to address the growing need for value-added storage services for its data intensive experiments. The main focus of the project is to provide scientific communities producing data collections in the tera — to petabyte range with the necessary hardware infrastructure as well as with adequate value-added services and support for the data management, processing, and preservation. In this work we describe the projects infrastructure and services design, as well as its meta data handling. Both community specific meta data schemes, a meta data repository, an application programming interface and a graphical tool for accessing the resources were developed to further support the processing workflows of the partner scientific communities. The analysis workflow of high throughput microscopy images for studying biomedical processes is described in detail.
international conference on e science | 2006
Tim O. Müller; Thomas Jejkal; Rainer Stotzka; Michael Sutter; Volker Hartmann; Hartmut Gemmeke
Grid is a rapidly growing new technology that will provide easy access to vast amounts of computer resources, both hardware and software. As these resources become available soon, more and more scientific users are interested in benefiting from them. At this time the main problem accessing Grid is that scientific user usually need to know a lot about Grid methods and technologies besides their own field of application. This paper describes a toolkit which is based om Grid Services designed especially for the field of process data processing providing database access and management, common methods of statistical data analysis and project specific methods. The toolkit will fill to some extent the gap between high-level scientific Grid users and low-level functions in Grid environments, thus simplifying and accelerating the development of scientific Grid applications.
Proceedings of IWSG 2016 : 8th International Workshop on Science Gateways, Rome, Italy, 8th - 10th June 2016. Ed.: S. Gesing | 2016
Richard Grunzke; Volker Hartmann; Thomas Jejkal; Ajinkya Prabhune; Hendrik Herold; Aline Deicke; Alexander Hoffmann; Torsten Schrade; Gotthard Meinel; Sonja Herres-Pawlis; Rainer Stotzka; Wolfgang E. Nagel
Nowadays, the daily work of many research communities is characterized by an increasing amount and complexity of data. This makes it increasingly difficult to manage, access and utilize to ultimately gain scientific insights based on it. At the same time, domain scientists want to focus on their science instead of IT. The solution is research data management in order to store data in a structured way to enable easy discovery for future reference. An integral part is the use of metadata. With it, data becomes accessible by its content instead of only its name and location. The use of metadata shall be as automatic and seamless as possible in order to foster a high usability. Here we present the architecture and initial steps of the MASi project with its aim to build a comprehensive research data management service. First, it extends the existing KIT Data Manager framework by a generic programming interface and by a generic graphical web interface. Advanced additional features includes the integration of provenance metadata and persistent identifiers. The MASi service aims at being easily adaptable for arbitrary communities with limited effort. The requirements for the initial use cases within geography, chemistry and digital humanities are elucidated. The MASi research data management service is currently being built up to satisfy these complex and varying requirements in an efficient way. Keywords—Metadata, Communities, Research Data Management
Archive | 2012
Michael Sutter; Volker Hartmann; J. van Wezel; A. Trunov; Thomas Jejkal; Rainer Stotzka
Research projects produce huge amounts of data, which have to be stored and analyzed immediately after the acquisition. Storing and analyzing of high data rates are normally not possible within the detectors and can be worse if several detectors with similar data rates are used within a project. In order to store the data for analysis, it has to be transferred on an appropriate infrastructure, where it is accessible at any time and from different clients. The Large Scale Data Facility (LSDF), which is currently developed at KIT, is designed to fulfill the requirements of data intensive scientific experiments or applications. Currently, the LSDF consists of a testbed installation for evaluating different technologies. From a user point of view, the LSDF is a huge data sink, providing in the initial state 6 PB of storage, and will be accessible via a couple of interfaces. As a user is not interested in learning dozens of APIs for accessing data a generic API, the ADALAPI, has been designed, providing unique interfaces for the transparent access to the LSDF over different technologies. The present contribution evaluates technologies useable for the development of the LSDF to meet the requirements of various scientific projects. Also, the ADALAPI and the first GUI based on it are introduced.
software engineering and advanced applications | 2007
Alexander Frank; Rainer Stotzka; Thomas Jejkal; Volker Hartmann; Michael Sutter; Hartmut Gemmeke
The presented dynamic grid service architecture provides a novel and comfortable access for scientific software developers and users without prior knowledge of grid technologies or even the underlying architecture. A simple Java API allows the extension of WSRF-compliant Web services by scientific software components deployed automatically on available GT4 containers. In preparation only two services are started on each container allowing hot-deployment and simple performance analysis. GridIJ is a reference implementation of the presented architecture with a problem solving environment for image processing. A plugin with a GUI extends the free software Image J, providing control of the architecture, deploying scientific software components (GridPlugins), distributing data to process in parallel, controlling the workflow and returning the results. First tests showed the simple usage and extension by GridPlugins of the system. In principle the performance in parallel execution scales linearly with an additional small overhead for connecting and data transfer.