Alexey Cheptsov
University of Stuttgart
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alexey Cheptsov.
web intelligence, mining and semantics | 2011
Matthias Assel; Alexey Cheptsov; Georgina Gallizo; Irene Celino; Daniele Dell'Aglio; Luka Bradesko; Michael J. Witbrock; Emanuele Della Valle
Recent advances in the Semantic Web community have yielded a variety of reasoning methods used to process and exploit semantically annotated data. However, most of those methods have only been approved for small, closed, trustworthy, consistent, and static domains. Still, there is a deep mismatch between the requirements for reasoning on a Web scale and the existing efficient reasoning algorithms over restricted subsets. This paper describes the pilot implementation of LarKC -- the Large Knowledge Collider, a platform, which focuses on supporting large-scale reasoning over billions of structured data in heterogeneous data sets. The architecture of LarKC allows for an effective combination of techniques coming from different Semantic Web domains by following a service-oriented approach, supplied by sustainable infrastructure solutions.
Computer Standards & Interfaces | 2012
Alexey Cheptsov; Bastian Koller; Davide Adami; Franco Davoli; Szymon Mueller; Norbert Meyer; Paolo Lazzari; Stefano Salon; Johannes Watzl; Michael Schiffers; Dieter Kranzlmueller
Despite the tremendous growth of the capacity of computation and storage IT solutions over the last years there is still a deep mismatch between the e-Infrastructures and the e-Science applications that use instruments, sensors, and laboratory equipment. The efficiency of using instruments in a remote way, i.e. Remote Instrumentation, might be largely improved by integration with the existing distributed computing and storage infrastructures, like Grids. The paper discusses major activities towards the e-Infrastructure for Remote Instrumentation - a Grid-based Information and Communication Technology environment capable of covering all the issues arising around enabling Remote Instrumentation for e-Science applications.
intelligent data acquisition and advanced computing systems: technology and applications | 2009
Alexey Cheptsov; Bastian Koller; Dieter Kranzmueller; Thomas Koeckerbauer; Szymon Mueller; Norbert Meyer; Franco Davoli; Davide Adami; Stefano Salon; Paolo Lazzari
Whereas the available resources and storage capabilities constitute the most important limitation for researchers in experiments to be performed, increasing availability of high-performance computing resources, provided by the Grid, has allowed many e-Science communities to proceed with new challenging experiments, especially involving expense and complex specialized measurement instrumentation and pervasive large-scale data acquisition platforms. Remote instrumentation, which means providing control of distributed scientific instruments by users from remote locations, is an important part of functionality that applications, developed in a number of e-Science domains (among others, environmental science, earthquake engineering, experimental science), are supposed to provide. The EC-funded Deployment of Remote Instrumentation Infrastructure (DORII) project aims to establish a new e-Infrastructure which allows the applications to provide remote instrumentation services in high-performance Grid computing environments. The paper presents basic aspects of the Remote Instrumentation Infrastructure deployment and further use with respect to requirements of specific application fields of e-Science.
advanced information networking and applications | 2013
Alexey Cheptsov
In view of the explosive data growth along with excessive QoS requirements on scalability and processing time constraints, the Web is expected to dominate the data-centric computing already in the next decade. On the other hand, most of the current high performance computing infrastructures, both academic and industrial, do not support parallel Web applications, which are prevalently developed in the Java language. As a reaction to novel challenges of promoting data centric supercomputing to the Web, we present a solution that introduces Java bindings for the Message Passing Interface (MPI), seamlessly integrated in one of the famous MPI native implementations - Open MPI. Our implementation allows Java-based Semantic Web applications to be successfully ported to the most of modern high performance computing systems. We discuss the design features of Open MPI and introduce basic benchmark evaluations for Web applications.
Archive | 2015
Alexey Cheptsov; Bastian Koller
Modern computing technologies are increasingly getting data-centric, addressing a variety of challenges in storing, accessing, processing, and streaming massive amounts of structured and unstructured data effectively. An important analytical task in a number of scientific and technological domains is to retrieve information from all these data, aiming to get a deeper insight into the content represented by the data in order to obtain some useful, often not explicitly stated knowledge and facts, related to a particular domain of interest. The major issue is the size, structural complexity, and frequency of the analyzed data’ updates (i.e., the ‘big data’ aspect), which makes the use of traditional analysis techniques, tools, and infrastructures ineffective. We introduce an innovative approach to parallelise data-centric applications based on the Message-Passing Interface. In contrast to other known parallelisation technologies, our approach enables a very high-utilization rate and thus low costs of using productional high-performance computing and Cloud computing infrastructures. The advantages of the technique are demonstrated on a challenging Semantic Web application that is performing web-scale reasoning.
web information systems engineering | 2013
Alexey Cheptsov; Axel Tenschert; Paul Schmidt; Birte Glimm; Mauricio Matthesius; Thorsten Liebig
A good deal of digital data produced in academia, commerce and industry is made up of a raw, unstructured text, such as Word documents, Excel tables, emails, web pages, etc., which are also often represented in a natural language. An important analytical task in a number of scientific and technological domains is to retrieve information from text data, aiming to get a deeper insight into the content represented by the data in order to obtain some useful, often not explicitly stated knowledge and facts, related to a particular domain of interest. The major challenge is the size, structural complexity, and frequency of the analysed text sets’ updates (i.e., the ‘big data’ aspect), which makes the use of traditional analysis techniques and tools impossible. We introduce an innovative approach to analyse unstructured text data. This allows for improving traditional data mining techniques by adopting algorithms from ontological domain modelling, natural language processing, and machine learning. The technique is inherently designed with parallelism in mind, which allows for high performance on large-scale Cloud computing infrastructures.
Archive | 2011
Alexey Cheptsov; Matthias Assel
In the recent years, performance has become a key point for a number of Java applications. For some of them, such as from the Semantic Web domain, where the size and the scale of the analyzed data is of a big challenge for a conventional computer, use of the High Performance Computing (HPC) systems is a major factor in achieving the required scalability and performance demands. Parallelization is a key mechanism that leverages HPC for such applications. However, the high development effort for a scalable parallel application has been a major drawback towards the efficient application of HPC to the applications designed for a serial execution only. The Message-Passing Interface (MPI) is a well-known programming standard for large-scale parallel applications development. However, MPI has found its most wide use in the applications written in C and Fortran. We show how MPI can be beneficially applied for the parallelization of the Java applications as well. We describe a parallel implementation of a Random Indexing application that performs similarity search in the large text corpora on the web, which allowed us to improve the performance by up to 33 times on the already 16 nodes of a testbed HPC system.
Archive | 2015
Alexey Cheptsov; Bastian Koller
We present a scalable, open source realization of MPI-2 standard for Java, seamlessly integrated in Open MPI, as a reaction to novel challenges of supercomputing in the web application domain. A number of Java software solutions developed for the Web, such as coming from Information Retrieval, Semantic Web, and other domains, have begun to face performance and scalability challenges, for which MPI has proved to be an efficient solution in the “traditional” high performance computing languages, such as C and Fortran. We demonstrate that the native Java language design prevents MPI implementations to scale massively on productional supercomputing systems, and present a solution of overcoming the scalability issues by integrating in the native C realization of Open MPI. We also point out the design features of Open MPI that enable the proliferation of MPI into Java applications. Finally, we present some successful pilot scenarios implemented with MPI in Java and discuss future work in terms of promising Java applications of Open MPI, such as Random Indexing of large semantically annotated text sets.
Archive | 2012
Alexey Cheptsov; Bastian Koller
In the recent years, the Grid has become the most progressive IT trend that has enabled the high-performance computing for a number of scientific domains. The large-scale infrastructures (such as Distributed European Infrastructure for Supercomputing Applications setup in the frame of DEISA or Remote Instrumentation Infrastructure deployed within the DORII EU project) enabled the Grid technology on practice for many application areas of e-Science and have served as a testbed for performing challenging experiments, often involving the results acquired from complex technical and laboratory equipments. However, as the Grid technology has matured, the attention is largely shifted towards optimization of Grid resource utilization by the applications. The performance analysis module setup within the DORII project offers scientific applications an advanced tool set for the optimization of performance characteristics on the Grid. The performance analysis tools adapted and techniques elaborated within DORII for parallel applications, implemented for example by means of Message-Passing Interface (MPI), are presented in this chapter and might be of great interest for the optimization of a wide variety of parallel scientific applications.
Archive | 2011
Alexey Cheptsov; Bastian Koller; S. Salon; P. Lazzari; J. Gracia
Over the last years, Grid computing has become a very important research area. The Grid allows the parallel execution of scientific applications in a heterogeneous infrastructure of geographically distributed resources. Parallel applications can foremost benefit from a Grid infrastructure in terms of performance and scalability improvement. However, performance expectations from porting an application to the Grid are considerably limited due to several factors, bottlenecks in the implementation of communication patterns are in the back of. Based on the analysis of the OPATM-BFM oceanographic application, we elaborate the strategy of the communication-intensive parallel applications analysis. This allowed us to identify several optimization proposals for the current realization of the communication pattern and improve the performance and scalability of the OPATM-BFM. As the suggested improvements are quite generic, they can be potentially useful for other parallel scientific applications.