Suren Chilingaryan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Suren Chilingaryan is active.

Explore More

Publication

Featured researches published by Suren Chilingaryan.

ieee international conference on high performance computing data and analytics | 2012

UFO: A Scalable GPU-based Image Processing Framework for On-line Monitoring

Matthias Vogelgesang; Suren Chilingaryan; Tomy dos Santos Rolo; Andreas Kopmann

Current synchrotron experiments require state-of-the-art scientific cameras with sensors that provide several million pixels, each at a dynamic range of up to 16 bits and the ability to acquire hundreds of frames per second. The resulting data bandwidth of such a data stream reaches several Gigabits per second. These streams have to be processed in real-time to achieve a fast process response. In this paper we present a computation framework and middleware library that provides re-usable building blocks to implement high-performance image processing algorithms without requiring profound hardware knowledge. It is based on a graph structure of computation nodes that process image transformation kernels on either CPU or GPU using the OpenCL sub-system. This system architecture allows deployment of the framework on a large range of computational hardware, from netbooks to hybrid compute clusters. We evaluated the library with standard image processing algorithms required for high quality tomographic reconstructions. The results show that speed-ups from 7× to 37× compared to traditional CPU-based solutions can be achieved with our approach, hence providing an opportunity for real-time on-line monitoring at synchrotron beam lines.

ieee-npss real-time conference | 2010

A GPU-based architecture for real-time data assessment at synchrotron experiments

Suren Chilingaryan; Alessandro Mirone; Andrew Hammersley; Claudio Ferrero; Lukas Helfen; Andreas Kopmann; Tomy dos Santos Rolo; Patrik Vagovič

Current imaging experiments at synchrotron beam lines often lack a real-time data assessment. X-ray imaging cameras installed at synchrotron facilities like ANKA provide millions of pixels, each with a resolution of 12 bits or more, and take up to several thousand frames per second. A given experiment can produce data sets of multiple gigabytes in a few seconds. Up to now the data is stored in local memory, transferred to mass storage, and then processed and analyzed off-line. The data quality and thus the success of the experiment, can, therefore, only be judged with a substantial delay, which makes an immediate monitoring of the results impossible. To optimize the usage of the micro-tomography beam-line at ANKA we have ported the reconstruction software to modern graphic adapters which offer an enormous amount of calculation power. We were able to reduce the reconstruction time from multiple hours to just a few minutes with a sample dataset of 20 GB. Using the new reconstruction software it is possible to provide a near real-time visualization and significantly reduce the time needed for the first evaluation of the reconstructed sample. The main paradigm of our approach is 100% utilization of all system resources. The compute intensive parts are offloaded to the GPU. While the GPU is reconstructing one slice, the CPUs are used to prepare the next one. A special attention is devoted to minimize data transfers between the host and GPU memory and to execute I/O operations in parallel with the computations. It could be shown that for our application not the computational part but the data transfers are now limiting the speed of the reconstruction. Several changes in the architecture of the DAQ system are proposed to overcome this second bottleneck. The article will introduce the system architecture, describe the hardware platform in details, and analyze performance gains during the first half year of operation.

Nuclear Instruments & Methods in Physics Research Section A-accelerators Spectrometers Detectors and Associated Equipment | 2015

Focal-plane detector system for the KATRIN experiment

J.F. Amsbaugh; J. Barrett; A. Beglarian; Till Bergmann; H. Bichsel; L. I. Bodine; J. Bonn; N.M. Boyd; T.H. Burritt; Z. Chaoui; Suren Chilingaryan; T.J. Corona; P. J. Doe; J.A. Dunmore; S. Enomoto; Joseph A. Formaggio; F.M. Fränkle; D. Furse; H. Gemmeke; F. Glück; F. Harms; G. Harper; J. Hartmann; M. A. Howe; A. Kaboth; J. Kelsey; M. Knauer; Andreas Kopmann; M. Leber; E.L. Martin

Abstract The focal-plane detector system for the KArlsruhe TRItium Neutrino (KATRIN) experiment consists of a multi-pixel silicon p-i-n-diode array, custom readout electronics, two superconducting solenoid magnets, an ultra high-vacuum system, a high-vacuum system, calibration and monitoring devices, a scintillating veto, and a custom data-acquisition system. It is designed to detect the low-energy electrons selected by the KATRIN main spectrometer. We describe the system and summarize its performance after its final installation.

IEEE Transactions on Nuclear Science | 2015

A PCIe DMA Architecture for Multi-Gigabyte Per Second Data Transmission

Lorenzo Rota; Michele Caselle; Suren Chilingaryan; Andreas Kopmann; M. Weber

We developed a direct memory access (DMA) engine compatible with the Xilinx PCI Express (PCIe) core to provide a high-performance and low-occupancy alternative to commercial solutions. In order to maximize the PCIe throughput while minimizing the FPGA resources utilization, the DMA engine adopts a novel strategy where the DMA address list is stored inside the FPGA and not in the central memory of the host CPU. The FPGA design package is complemented with simple register access to control the DMA engine by a Linux driver. The design is compatible with Xilinx FPGA Families 6 and 7, and operates with the Xilinx PCIe endpoint Generation 1 and 2 with all lane configurations (x1, x2, x4, x8). A multi-engine architecture is also presented, where two x8 lanes cores are used in parallel together with a PCIe bridge, to exploit fully the capabilities of a PCIe Gen2 x16 lanes link. A data throughput of 3461 MBytes/s has been achieved with a single PCIe Gen2 x8 lanes endpoint. If the dual-engine architecture is used, the throughput is increased up to 6920 MBytes/s. The presented DMA is currently used in several experiments at the ANKA synchrotron light source.

Journal of Instrumentation | 2014

An ultra-fast data acquisition system for coherent synchrotron radiation with terahertz detectors

Michele Caselle; M. Balzer; Suren Chilingaryan; M. Hofherr; V. Judin; Andreas Kopmann; N. Smale; Petra Thoma; Stefan Wuensch; Anke-Susanne Müller; M. Siegel; M. Weber

The recording of coherent synchrotron radiation requires data acquisition systems with a temporal resolution of tens of picosecond. This paper describes a new real-time and high-accuracy data acquisition system suitable for recording individual ultra-short pulses generated by a fast terahertz (THz) detector (e.g. YBCO, NbN, Zero Biased Schottky Diode). The system consists of a fast sampling board combined with a high data throughput readout. The first board is designed for sampling the fast pulse signals with a full width half maximum (FWHM) between a few tens to one hundred picoseconds with a minimum sampling time of 3 ps. The high data throughput board consists of a PCIe-Bus Master DMA architecture used for fast data transfer up to 3 GByte/s. The full readout chain with fast THz detectors and the acquisition system has been successfully tested at the synchrotron ANKA. An overview of the electronics system and preliminary results with multi-bunch filling pattern will be presented.

Journal of Physics: Conference Series | 2010

Advanced data extraction infrastructure: Web based system for management of time series data

Suren Chilingaryan; A. Beglarian; Andreas Kopmann; S. Vöcking

During operation of high energy physics experiments a big amount of slow control data is recorded. It is necessary to examine all collected data checking the integrity and validity of measurements. With growing maturity of AJAX technologies it becomes possible to construct sophisticated interfaces using web technologies only. Our solution for handling time series, generally slow control data, has a modular architecture: backend system for data analysis and preparation, a web service interface for data access and a fast AJAX web display. In order to provide fast interactive access the time series are aggregated over time slices of few predefined lengths. The aggregated values are stored in the temporary caching database and, then, are used to create generalizing data plots. These plots may include indication of data quality and are generated within few hundreds of milliseconds even if very high data rates are involved. The extensible export subsystem provides data in multiple formats including CSV, Excel, ROOT, and TDMS. The search engine can be used to find periods of time where indications of selected sensors are falling into the specified ranges. Utilization of the caching database allows performing most of such lookups within a second. Based on this functionality a web interface facilitating fast (Google-maps style) navigation through the data has been implemented. The solution is at the moment used by several slow control systems at Test Facility for Fusion Magnets (TOSKA) and Karlsruhe Tritium Neutrino (KATRIN).

ieee-npss real-time conference | 2014

A new DMA PCIe architecture for Gigabyte data transmission

Lorenzo Rota; Michele Caselle; Suren Chilingaryan; Andreas Kopmann; Marc Weber

PCI Express (PCIe) is a high-speed serial point-to-point interconnect that delivers high-performance data throughput. KIT has developed a Direct Memory Access (DMA) engine compatible with the Xilinx PCIe core to provide a smart and low-occupancy alternative logic to expensive commercial solutions. In order to maximize the PCIe throughput the DMA engine adopts a new strategy, where the DMA descriptor list is stored inside the FPGA and not in the central memory system. The FPGA design package is complemented with a simple register access to control the DMA engine by a Linux driver. A handshaking sequence between the DMA engine and the Linux driver ensures that no errors occure, even in data transfers of several hundreds of Gigabytes. The design has been tested with Xilinx FPGA Families 6 and 7, and operates with the Xilinx PCIe endpoint generation 1 and 2 with all lane configurations (x1, x2, x4, x8, x16). Data throughput of more than 3.4 GB/s has been achieved with a PCIe Gen 2 ×8 lanes endpoint. The proposed DMA is currently used in several experiments at the ANKA synchrotron light source.

ieee international conference on high performance computing data and analytics | 2011

Poster: a GPU-based architecture for real-time data assessment at synchrotron experiments

Suren Chilingaryan; Andreas Kopmann; Alessandro Mirone; Tomy dos Santos Rolo; Matthias Vogelgesang

Journal of Instrumentation | 2016

A high-throughput readout architecture based on PCI-Express Gen3 and DirectGMA technology

Lorenzo Rota; Matthias Vogelgesang; L.E. Ardila Perez; Michele Caselle; Suren Chilingaryan; T. Dritschler; N. Zilio; Andreas Kopmann; M. Balzer; M. Weber

Modern physics experiments produce multi-GB/s data rates. Fast data links and high performance computing stages are required for continuous data acquisition and processing. Because of their intrinsic parallelism and computational power, GPUs emerged as an ideal solution to process this data in high performance computing applications. In this paper we present a high-throughput platform based on direct FPGA-GPU communication. The architecture consists of a Direct Memory Access (DMA) engine compatible with the Xilinx PCI-Express core, a Linux driver for register access, and high- level software to manage direct memory transfers using AMDs DirectGMA technology. Measurements with a Gen3 x8 link show a throughput of 6.4 GB/s for transfers to GPU memory and 6.6 GB/s to system memory. We also assess the possibility of using the architecture in low latency systems: preliminary measurements show a round-trip latency as low as 1 μs for data transfers to system memory, while the additional latency introduced by OpenCL scheduling is the current limitation for GPU based systems. Our implementation is suitable for real-time DAQ system applications ranging from photon science and medical imaging to High Energy Physics (HEP) systems.

international joint conference on computer vision imaging and computer graphics theory and applications | 2017

WAVE: A 3D Online Previewing Framework for Big Data Archives.

Nicholas Tan Jerome; Suren Chilingaryan; Andrei Shkarin; Andreas Kopmann; Michael Zapf; Alexander Lizin; Till Bergmann

With data sets growing beyond terabytes or even petabytes in scientific experiments, there is a trend of keeping data at storage facilities and providing remote cloud-based services for analysis. However, accessing these data sets remotely is cumbersome due to additional network latency and incomplete metadata description. To ease data browsing on remote data archives, our WAVE framework applies an intelligent cache management to provide scientists with a visual feedback on the large data set interactively. In this paper, we present methods to reduce the data set size while preserving visual quality. Our framework supports volume rendering and surface rendering for data inspection and analysis. Furthermore, we enable a zoom-on-demand approach, where a selected volumetric region is reloaded with higher details. Finally, we evaluated the WAVE framework using a data set from the entomology science research.

Explore More