Is this you? Create Your Porfile

A. Kazarov

Petersburg Nuclear Physics Institute

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where A. Kazarov is active.

Explore More

Publication

Featured researches published by A. Kazarov.

IEEE Transactions on Nuclear Science | 2004

Online software for the ATLAS test beam data acquisition system

I. Alexandrov; A. Amorim; E. Badescu; M. Barczyk; D. Burckhart-Chromek; M. Caprini; J.D.S. Conceicao; J. Flammer; M. Dobson; R. Hart; R. W. L. Jones; A. Kazarov; S. Kolos; V. M. Kotov; D. Klose; D. Liko; J. G. R. Lima; Levi Lúcio; L. Mapelli; M. Mineev; Luis G. Pedro; Y. F. Ryabov; I. Soloviev; H. Wolters

The Online Software is the global system software of the ATLAS data acquisition (DAQ) system, responsible for the configuration, control and information sharing of the ATLAS DAQ System. A test beam facility offers the ATLAS detectors the possibility to study important performance aspects as well as to proceed on the way to the final ATLAS DAQ system. Last year, three subdetectors of ATLAS-separately and combined-were successfully using the Online Software for the control of their datataking. In this paper, we describe the different components of the Online Software together with their usage at the ATLAS test beam.

Archive | 2004

CONTROL IN THE ATLAS TDAQ SYSTEM

D. Liko; I. Soloviev; R. W. L. Jones; S. Kolos; J. Flammer; Yu. Ryabov; A. Kazarov; M. Mineev; L. Mapelli; I. Alexandrov; S Korobov; D. Burckhart-Chromek; Kotov; M. Caprini; E. Badescu; N Fiuza de Barros; A. Amorim; D. Klose; Luis G. Pedro; M. Dobson

The unprecedented size and complexity of the ATLAS TDAQ system requires a comprehensive and flexible control system. Its role ranges from the so-called run- control, e.g. starting and stopping the data taking, to error handling and fault tolerance. It also includes initialization and verification of the overall system. Following the traditional approach a hierarchical system of customizable controllers has been proposed. For the final system all functionality will be therefore available in a distributed manner, with the possibility of local customization. After a technology survey the open source expert system CLIPS has been chosen as a basis for the implementation of the supervision and the verification system. The CLIPS interpreter has been extended to provide a general control framework. Other ATLAS Online software components have been integrated as plug-ins and provide the mechanism for configuration and communication. Several components have been implemented sharing this technology. The dynamic behavior of the individual component is fully described by the rules, while the framework is based on a common implementation. During this year these components have been the subject of scalability tests up to the full system size. Encouraging results are presented and validate the technology choice.

Journal of Physics: Conference Series | 2012

The AAL project: automated monitoring and intelligent analysis for the ATLAS data taking infrastructure

A. Kazarov; G. Lehmann Miotto; L. Magnoni

The Trigger and Data Acquisition (TDAQ) system of the ATLAS experiment at CERN is the infrastructure responsible for collecting and transferring ATLAS experimental data from detectors to the mass storage system. It relies on a large, distributed computing environment, including thousands of computing nodes with thousands of application running concurrently. In such a complex environment, information analysis is fundamental for controlling applications behavior, error reporting and operational monitoring. During data taking runs, streams of messages sent by applications via the message reporting system together with data published from applications via information services are the main sources of knowledge about correctness of running operations. The flow of data produced (with an average rate of O(1-10KHz)) is constantly monitored by experts to detect problem or misbehavior. This requires strong competence and experience in understanding and discovering problems and root causes, and often the meaningful information is not in the single message or update, but in the aggregated behavior in a certain time-line. The AAL project is meant at reducing the man power needs and at assuring a constant high quality of problem detection by automating most of the monitoring tasks and providing real-time correlation of data-taking and system metrics. This project combines technologies coming from different disciplines, in particular it leverages on an Event Driven Architecture to unify the flow of data from the ATLAS infrastructure, on a Complex Event Processing (CEP) engine for correlation of events and on a message oriented architecture for components integration. The project is composed of 2 main components: a core processing engine, responsible for correlation of events through expert-defined queries and a web based front-end to present real-time information and interact with the system. All components works in a loose-coupled event based architecture, with a message broker to centralize all communication between modules. The result is an intelligent system able to extract and compute relevant information from the flow of operational data to provide real-time feedback to human experts who can promptly react when needed. The paper presents the design and implementation of the AAL project, together with the results of its usage as automated monitoring assistant for the ATLAS data taking infrastructure.

ieee-npss real-time conference | 2012

Use of Expert system and Data Analysis Technologies in automation of error detection, diagnosis and recovery for ATLAS Trigger-DAQ Control framework

A. Kazarov; Alina Corso Radu; L. Magnoni; Giovanna Lehmann Miotto

Trigger and Data Acquisition (TDAQ) System of the ATLAS experiment on LHC at CERN is a very complex distributed computing system, composed of O(10000) applications running on a farm of commodity CPUs. The system is being designed and developed by dozens of software engineers and physicists since end of 1990s and it will be maintained in operational mode during the lifetime of the experiment. The TDAQ system is controlled by the Control framework, which includes a set of software components and tools used for system configuration, distributed processes handling, synchronization of Run Control state transitions etc. The huge flow of operational monitoring data produced is constantly monitored by operators and experts in order to detect problems or misbehavior. Given the scale of the system and the rates of data to be analyzed, the automation of the Control framework functionality in the areas of operational monitoring, system verification, error detection and recovery is a strong requirement. The paper describes requirements, technologies choice, high-level design and some implementation aspects of advanced Control tools based on knowledge-base technologies. The main aim of these tools is to store and to reuse developers expertise and operational knowledge in order to help TDAQ operators to control the system with maximum efficiency during life time of the experiment.

IEEE Transactions on Nuclear Science | 2007

A Rule-Based Verification and Control Framework in Atlas Trigger-DAQ

A. Kazarov; A. Corso-Radu; Giovanna Lehmann Miotto; J. Sloper; Y. Ryabov

In order to meet the requirements of ATLAS experiment data taking, the Trigger-DAQ (TDAQ) system is composed of O(10000) of applications running on more than 2600 computers in a network. With such a system size, software and hardware failures are quite frequent. To minimize system downtime, the Trigger-DAQ control system shall include advance verification and diagnostics facilities. The operator shall use tests and expertise of the TDAQ and detectors developers in order to diagnose and recover from errors, if possible automatically. The TDAQ control system is built as a distributed tree of controllers, where the behavior of each controller is defined in a rule-based language allowing easy customization. The control system also includes a verification framework which allows users to develop and configure tests for any component in the system with different levels of complexity. It can be used as a stand-alone test facility for a small detector installation, as part of the general TDAQ initialization procedure, and for diagnosing problems which may occur during run time. The system is currently being used in TDAQ commissioning at the ATLAS experimental zone and by subdetectors for stand-alone verification of the detector hardware before it is finally installed.

ieee-npss real-time conference | 2005

Deployment and use of the ATLAS DAQ in the combined test beam

S. Gadomski; M. Abolins; I. Alexandrov; A. Amorim; C. Padilla-Aranda; E. Badescu; N. Barros; H. P. Beck; R. E. Blair; D. Burckhart-Chromek; M. Caprini; M. Ciobotaru; P. Conde-Muíño; A. Corso-Radu; M. Diaz-Gomez; R. Dobinson; M. Dobson; Roberto Ferrari; M. L. Ferrer; David Francis; S. Gameiro; B. Gorini; M. Gruwe; S. Haas; C. Haeberli; R. Hauser; R. E. Hughes-Jones; M. Joos; A. Kazarov; D. Klose

The ATLAS collaboration at CERN operated a combined test beam (CTB) from May until November 2004. The prototype of ATLAS data acquisition system (DAQ) was used to integrate other subsystems into a common CTB setup. Data were collected synchronously from all the ATLAS detectors, which represented nine different detector technologies. Electronics and software of the first level trigger were used to trigger the setup. Event selection algorithms of the high level trigger were integrated with the system and were tested with real detector data. A possibility to operate a remote event filter farm synchronized with ATLAS TDAQ was also tested. Event data, as well as detectors conditions data were made available for offline analysis

ieee-npss real-time conference | 2007

The ATLAS DAQ System Online Configurations Database Service Challenge

J. Almeida; M. Dobson; A. Kazarov; Giovanna Lehmann Miotto; J. Sloper; I. Soloviev; Ret Torres

This paper describes challenging requirements on the configuration service for the ATLAS experiment at CERN. It presents the status of the implementation and testing one year before the start of data taking, providing details of: 1. the capabilities of the underlying OKS object manager to store and to archive configuration descriptions, its user and programming interfaces; 2. the organization of configuration descriptions for different types of data taking runs and combinations of participating sub-detectors; 3. the scalable architecture to support simultaneous access to the service by thousands of processes during the online configuration stage of ATLAS; 4. the experience with the usage of the configuration service during large scale tests, test beam, commissioning and technical runs. The paper also presents pro and contra of the chosen object-oriented implementation compared with solutions based on pure relational database technologies, and explains why after several years of usage we continue with our approach.

ieee-npss real-time conference | 2014

Performance of Splunk for the TDAQ information service at the ATLAS experiment

Y. Yasu; A. Kazarov

The ATLAS Trigger and Data Acquisition (TDAQ) is a large, distributed system composed of several thousand interconnected computers and tens of thousands software processes. Monitoring data produced by multiple sources are selected, aggregated and correlated to perform the analysis of the monitored data. Then they can finally be visualized and presented to the user. Any system implementing these functions has to be flexible in order to adapt to the amount of data produced and requested by the users for analysis and visualization. Due to the size of the ATLAS TDAQ system, the scalability is also important from the performance point of view. Splunk, a commercial product produced by Splunk Inc., is a general-purpose search, analysis & reporting engine and a distributed, non-relational, semi-structured database for time-series text data. This paper describes the evaluation of Splunk for the functionality and the performance.

ieee-npss real-time conference | 2014

A scalable and reliable message transport service for the ATLAS Trigger and Data Acquisition system

A. Kazarov; M. Caprini; S. Kolos; Giovanna Lehmann Miotto; I. Soloviev

The ATLAS Trigger and Data Acquisition (TDAQ) is a large distributed computing system composed of several thousands of interconnected computers and tens of thousands applications. During a run, TDAQ applications produce a lot of control and information messages with variable rates, addressed to TDAQ operators or to other applications. Reliable, fast and accurate delivery of the messages is important for the functioning of the whole TDAQ system. The Message Transport Service (MTS) provides facilities for the reliable transport, the filtering and the routing of the messages, based on the publish-subscribe-notify communication pattern with content-based message filtering.During the ongoing LHC shutdown, MTS was re-implemented, taking into account important requirements like reliability, scalability and performance, handling of slow subscribers case and also simplicity of the design and the implementation. MTS uses CORBA middleware, a common layer for TDAQ infrastructure, and provides sending/subscribing APIs in the Java and C++ programming languages. The paper presents the design and the implementation details of MTS, as well as the results of performance and scalability tests executed on a computing farm with an amount of workers and working conditions which reproduced a realistic TDAQ environment during ATLAS operations.

Journal of Physics: Conference Series | 2012

Applications of advanced data analysis and expert system technologies in the ATLAS Trigger-DAQ Controls framework

G. Avolio; A. Corso Radu; A. Kazarov; G. Lehmann Miotto; L. Magnoni

The Trigger and Data Acquisition (TDAQ) system of the ATLAS experiment is a very complex distributed computing system, composed of more than 20000 applications running on more than 2000 computers. The TDAQ Controls system has to guarantee the smooth and synchronous operations of all the TDAQ components and has to provide the means to minimize the downtime of the system caused by runtime failures. During data taking runs, streams of information messages sent or published by running applications are the main sources of knowledge about correctness of running operations. The huge flow of operational monitoring data produced is constantly monitored by experts in order to detect problems or misbehaviours. Given the scale of the system and the rates of data to be analyzed, the automation of the system functionality in the areas of operational monitoring, system verification, error detection and recovery is a strong requirement. To accomplish its objective, the Controls system includes some high-level components which are based on advanced software technologies, namely the rule-based Expert System and the Complex Event Processing engines. The chosen techniques allow to formalize, store and reuse the knowledge of experts and thus to assist the shifters in the ATLAS control room during the data-taking activities.

Explore More