Is this you? Create Your Porfile

Péter Szolgay

The Catholic University of America

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Péter Szolgay is active.

Explore More

Publication

Featured researches published by Péter Szolgay.

european conference on circuit theory and design | 2007

Vision based human-machine interface via hand gestures

Norbert Bérci; Péter Szolgay

There is a great need for alternative human-machine interfaces in several application areas. In the current paper we present the first stage of a hand gesture recognition system with the primary purpose of replacing classical input peripherals like the mouse and the keyboard. The system works by visual input and the processing has been implemented on the cellular neural/nonlinear network paradigm based Bi-i visual processing architecture. Its properties, most notably the real-time performance, allows it to be used for security, military, medical, surgery and public media applications as well.

Computer Physics Communications | 2014

The density matrix renormalization group algorithm on kilo-processor architectures: Implementation and trade-offs

Csaba Nemes; Gergely Barcza; Zoltán Nagy; Örs Legeza; Péter Szolgay

Abstract In the numerical analysis of strongly correlated quantum lattice models one of the leading algorithms developed to balance the size of the effective Hilbert space and the accuracy of the simulation is the density matrix renormalization group (DMRG) algorithm, in which the run-time is dominated by the iterative diagonalization of the Hamilton operator. As the most time-dominant step of the diagonalization can be expressed as a list of dense matrix operations, the DMRG is an appealing candidate to fully utilize the computing power residing in novel kilo-processor architectures. In the paper a smart hybrid CPU–GPU implementation is presented, which exploits the power of both CPU and GPU and tolerates problems exceeding the GPU memory size. Furthermore, a new CUDA kernel has been designed for asymmetric matrix–vector multiplication to accelerate the rest of the diagonalization. Besides the evaluation of the GPU implementation, the practical limits of an FPGA implementation are also discussed.

european conference on circuit theory and design | 2011

Efficient mapping of mathematical expressions to FPGAs: Exploring different design methodologies

Csaba Nemes; Zoltan Nagy; Péter Szolgay

Computationally intensive problems can be represented with data-flow graphs and automatically transformed to locally controlled floating-point units via partitioning. In theory the lack of global control signals enables high performance implementation however placing and routing of the partitioned circuits are not trivial. In practice to create a high performance implementation the clusters should be placed efficiently on the surface of an FPGA using the physical constraining feature of CAD tools. In the paper a new partitioning strategy is presented which not only minimizes the number of cut nets but produce partition which can be mapped without long interconnections between the clusters. The new strategy is demonstrated during the automatic circuit generation from a complex mathematical expression. The proposed partitioning method produces more cut nets than common strategies however the resulting partition can be easily mapped and operate on significantly higher frequency.

international symposium on circuits and systems | 2009

Towards a gesture based human-machine interface: Fast 3D tracking of the human fingers on high speed smart camera computers

Norbert Bérci; Péter Szolgay

Rich multimedia and 3D graphical UIs need new methods for efficient input. Current mainstream human-machine interfaces are the keyboard and the mouse but touch based ones are emerging recently. Some more esoteric counterparts are also used mainly in virtual reality systems. These current devices has some properties which make them unusable (wires attached or gloves or other markers must be worn) and lacking some others (touch less and distant control). Our project aims to develop a system based on 3D visual recognition of human hand gestures for both home and industrial applications. In this paper we present the fundamental part of this system: the 3D finger tracking algorithm and its implementation properties on smart camera systems.

Cellular Nanoscale Networks and their Applications (CNNA), 2014 14th International Workshop on 1-2 | 2014

FPGA-based simulation of 3D light propagation

András Kiss; Zoltán Nagy; Péter Szolgay; Tamás Roska; G. Csaba; Xiaobo Sharon Hu; Wolfgang Porod

In this paper, we describe how to emulate 3D wave dynamics on a 2D FPGA-based architecture. The algorithm is based on the Paraxial Helmholtz Equation: which describes the beam propagation through different media with different refractive indices. To solve this wave propagation equation numerically the FPGA-accelerated hardware operates with spatially varying templates. The FPGA-based wave-equation solver is very well parallelizable, so the resulting algorithm will also be amenable to mega-core architectures.

2014 14th International Workshop on Cellular Nanoscale Networks and Their Applications, CNNA 2014 | 2014

Data locality-based mesh partitioning methods for dataflow machines

Antal Hiba; Zoltan Nagy; Miklos Ruszinko; Péter Szolgay

Power efficiency became an important factor in High Performance Computing (HPC). FPGA-based dataflow machines are the best candidates for power efficient computing, because of the maximized memory bandwidth utilization, and user-defined optimal caching. However, input data streams are required with optimized data locality. This paper focuses on the possibilities of novel mesh partitioning techniques, which provide partitions with better data locality.

2014 14th International Workshop on Cellular Nanoscale Networks and Their Applications, CNNA 2014 | 2014

Emulating optically inspired massively parallel non-Boolean operators on FPGA

András Kiss; Zoltán Nagy; Péter Szolgay; Tamás Roska; Gyorgy Csaba; Xiaobo Sharon Hu; Wolfgang Porod

In this paper, we demonstrate two optically inspired massively parallel non-Boolean operators running on FPGA. One of the algorithm is based on the Paraxial Helmholtz Equation: which describes the beam propagation through different media with different refractive indices, and the other is based on the concepts of optical computing: quasi-optical wave equations are solved numerically, using FPGA-accelerated hardware. The second algorithm describes a holographic pattern-matching algorithm. Both of the two FPGA-based implementations are very well parallelizable, consequently they are also be amenable to mega-core architectures.

european conference on circuit theory and design | 2013

Implementation trade-offs of the density matrix renormalization group algorithm on kilo-processor architectures

Csaba Nemes; Gergely Barcza; Zoltan Nagy; Örs Legeza; Péter Szolgay

Numerical analysis of strongly correlated quantum lattice models has a great importance in quantum physics. The exponentially growing size of the Hilbert space makes these computations difficult, however sophisticated algorithms have been developed to balance the size of the effective Hilbert space and the accuracy of the simulation. One of these methods is the density matrix renormalization group (DMRG) algorithm which has become the leading numerical tool in the study of low dimensional lattice problems of current interest. In the algorithm a high computational problem can be translated to a list of dense matrix operations, which makes it an ideal application to fully utilize the computing power residing in both current multi-core processors and novel kilo-processor architectures.

2012 13th International Workshop on Cellular Nanoscale Networks and their Applications | 2012

Automatic generation of locally controlled arithmetic unit via floorplan based partitioning

Csaba Nemes; Zoltan Nagy; Péter Szolgay

In the paper a framework for generating a locally controlled arithmetic unit is presented including graph generation from a mathematical expression, graph partitioning to determine locally controlled parts of the design and VHDL generation. The output of the framework is a pipelined architecture containing locally controlled groups of floating point units. It is demonstrated that both partitioning and placement aspects of the design have to be considered to obtain a highspeed circuit. In a well-placeable design locally controlled groups can be mapped to FPGA in such a way that only neighboring groups communicate with each other. In the presented algorithm an initial floorplan of the floating point units is produced and a novel graph partitioning representation is used for partitioning the floating point units to obtain a well-placeable design. The framework is demonstrated during the automatic circuit generation of a complex mathematical expression related to Computation Fluid Dynamics (CFD). The framework produces 15-27% faster design than the unpartitioned, globally controlled one in the price of a modest area increase. The framework automatically produces well-placeable deadlock-free partitions for complex expressions as well, while in case of traditional partitioners these objectives cannot be targeted.

2010 12th International Workshop on Cellular Nanoscale Networks and their Applications (CNNA 2010) | 2010

A CNN motivated array computing model

Péter Szolgay; Zoltán Nagy

Approaching the limits of scaling down of CMOS circuits where transistors can switch faster and faster transmitting information between different areas of an integrated circuit has great importance. The speed of signals are determined by the physical properties of the medium therefore the distance between the elements should be decreased to improve performance. Array processors are a good candidate to solve this problem. Similar approach is required on today high performance field programmable logic devices where wire delay dominates over gate (LUT) delay. Centralized control unit of a configurable accelerator might become a performance bottleneck on the current state-of-the-art FPGAs. In the paper a process network inspired approach is given to create distributed control units. The advantage of the proposed method will be shown by designing a complex multi-layer array computing architecture to emulate the operation of a mammalian retina in real time.

Explore More