Kris Gaj | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kris Gaj is active.

Explore More

Publication

Featured researches published by Kris Gaj.

Archive | 2009

Cryptographic Hardware and Embedded Systems - CHES 2009

Christophe Clavier; Kris Gaj

Software Implementations.- Faster and Timing-Attack Resistant AES-GCM.- Accelerating AES with Vector Permute Instructions.- SSE Implementation of Multivariate PKCs on Modern x86 CPUs.- MicroEliece: McEliece for Embedded Devices.- Invited Talk 1.- Physical Unclonable Functions and Secure Processors.- Side Channel Analysis of Secret Key Cryptosystems.- Practical Electromagnetic Template Attack on HMAC.- First-Order Side-Channel Attacks on the Permutation Tables Countermeasure.- Algebraic Side-Channel Attacks on the AES: Why Time also Matters in DPA.- Differential Cluster Analysis.- Side Channel Analysis of Public Key Cryptosystems.- Known-Plaintext-Only Attack on RSA-CRT with Montgomery Multiplication.- A New Side-Channel Attack on RSA Prime Generation.- Side Channel and Fault Analysis Countermeasures.- An Efficient Method for Random Delay Generation in Embedded Software.- Higher-Order Masking and Shuffling for Software Implementations of Block Ciphers.- A Design Methodology for a DPA-Resistant Cryptographic LSI with RSL Techniques.- A Design Flow and Evaluation Framework for DPA-Resistant Instruction Set Extensions.- Invited Talk 2.- Crypto Engineering: Some History and Some Case Studies.- Pairing-Based Cryptography.- Hardware Accelerator for the Tate Pairing in Characteristic Three Based on Karatsuba-Ofman Multipliers.- Faster -Arithmetic for Cryptographic Pairings on Barreto-Naehrig Curves.- Designing an ASIP for Cryptographic Pairings over Barreto-Naehrig Curves.- New Ciphers and Efficient Implementations.- KATAN and KTANTAN - A Family of Small and Efficient Hardware-Oriented Block Ciphers.- Programmable and Parallel ECC Coprocessor Architecture: Tradeoffs between Area, Speed and Security.- Elliptic Curve Scalar Multiplication Combining Yaos Algorithm and Double Bases.- TRNGs and Device Identification.- The Frequency Injection Attack on Ring-Oscillator-Based True Random Number Generators.- Low-Overhead Implementation of a Soft Decision Helper Data Algorithm for SRAM PUFs.- CDs Have Fingerprints Too.- Invited Talk 3.- The State-of-the-Art in IC Reverse Engineering.- Hot Topic Session: Hardware Trojans and Trusted ICs.- Trojan Side-Channels: Lightweight Hardware Trojans through Side-Channel Engineering.- MERO: A Statistical Approach for Hardware Trojan Detection.- Theoretical Aspects.- On Tamper-Resistance from a Theoretical Viewpoint.- Mutual Information Analysis: How, When and Why?.- Fault Analysis.- Fault Attacks on RSA Signatures with Partially Unknown Messages.- Differential Fault Analysis on DES Middle Rounds.

cryptographic hardware and embedded systems | 2003

Very Compact FPGA Implementation of the AES Algorithm

Pawel Chodowiec; Kris Gaj

In this paper a compact FPGA architecture for the AES algorithm with 128-bitkey targeted for low-costembedded applications is presented. Encryption, decryption and key schedule are all implemented using small resources of only 222 Slices and 3 Block RAMs. This implementation easily fits in a low-costXilinx Spartan II XC2S30 FPGA. This implementation can encrypt and decrypt data streams of 150 Mbps, which satisfies the needs of most embedded applications, including wireless communication. Specific features of Spartan II FPGAs enabling compact logic implementation are explored, and a new way of implementing MixColumnsand InvMixColumnstransformations using shared logic resources is presented.

IEEE Computer | 2008

The Promise of High-Performance Reconfigurable Computing

Tarek A. El-Ghazawi; Esam El-Araby; Miaoqing Huang; Kris Gaj; Volodymyr V. Kindratenko; Duncan A. Buell

Several high-performance computers now use field-programmable gate arrays as reconfigurable coprocessors. The authors describe the two major contemporary HPRC architectures and explore the pros and cons of each using representative applications from remote sensing, molecular dynamics, bioinformatics, and cryptanalysis.

field programmable gate arrays | 2004

An embedded true random number generator for FPGAs

Paul Kohlbrenner; Kris Gaj

Field Programmable Gate Arrays (FPGAs) are an increasingly popular choice of platform for the implementation of cryptographic systems. Until recently, designers using FPGAs had less than optimal choices for a source of truly random bits. In this paper we extend a technique that uses on-chip jitter and PLLs to a much larger class of FPGAs that do not contain PLLs. Our design uses only the Configurable Logic Blocks (CLBs) common to all FPGAs, and has a self-testing capability. Using the intrinsic jitter contained in digital circuits, we produce random bits at speeds of up to 0.5 Mbits/second with good statistical characteristics. We discuss the engineering challenges of extracting random bits from digital circuits, and we report the results of running standard statistical tests (NIST) on the output generated by our system.

the cryptographers track at the rsa conference | 2001

Fast Implementation and Fair Comparison of the Final Candidates for Advanced Encryption Standard Using Field Programmable Gate Arrays

Kris Gaj; Pawel Chodowiec

The results of fast implementations of all five AES final candidates using Virtex Xilinx Field Programmable Gate Arrays are presented and analyzed. Performance of several alternative hardware architectures is discussed and compared. One architecture optimum from the point of view of the throughput to area ratio is selected for each of the two major types of block cipher modes. For feedback cipher modes, all AES candidates have been implemented using the basic iterative architecture, and achieved speeds ranging from 61 Mbit/s for Mars to 431 Mbit/s for Serpent. For non-feedback cipher modes, four AES candidates have been implemented using a high-throughput architecture with pipelining inside and outside of cipher rounds, and achieved speeds ranging from 12.2 Gbit/s for Rijndael to 16.8 Gbit/s for Serpent. A new methodology for a fair comparison of the hardware performance of secret-key block ciphers has been developed and contrasted with methodology used by the NSA team.

international conference on information security | 2002

Comparative Analysis of the Hardware Implementations of Hash Functions SHA-1 and SHA-512

Tim Grembowski; Roar Lien; Kris Gaj; Nghi Nguyen; Peter Bellows; Jaroslav Flidr; Tom Lehman; Brian Schott

Hash functions are among the most widespread cryptographic primitives, and are currently used in multiple cryptographic schemes and security protocols such as IPSec and SSL. In this paper, we compare and contrast hardware implementations of the newly proposed draft hash standard SHA-512, and the old standard, SHA-1. In our implementation based on Xilinx Virtex FPGAs, the throughput of SHA-512 is equal to 670 Mbit/s, compared to 530 Mbit/s for SHA-1. Our analysis shows that the newly proposed hash standard is not only orders of magnitude more secure, but also significantly faster than the old standard. The basic iterative architectures of both hash functions are faster than the basic iterative architectures of symmetric-key ciphers with equivalent security.

field programmable gate arrays | 2001

Fast implementations of secret-key block ciphers using mixed inner- and outer-round pipelining

Pawel Chodowiec; Po Khuon; Kris Gaj

The new design methodology for secret-key block ciphers, based on introducing an optimum number of pipeline stages inside of a cipher round is presented and evaluated. This methodology is applied to five well-known modern ciphers, Triple DES, Rijndael, RC6, Serpent, and Twofish, with the goal to first obtain the architecture with the optimum throughput to area ratio, and then the architecture with the highest possible throughput. All ciphers are modeled in VHDL, and implemented using Xilinx Virtex FPGA devices. It is demonstrated that all investigated ciphers can operate with similar maximum clock frequencies, in the range from 95 to 131 MHz, limited only by the delay of a single CLB layer and delays of interconnects. Rijndael, RC6, Twofish, and Serpent achieve throughputs in the range from 12.1 Gbit/s to 16.8 Gbit/s; and Triple DES achieves the throughput of 7.5 Gbit/s. Because of the optimum speed to cost ratio, the proposed architecture seems to be very well suited for practical implementations of secret-key block ciphers using both FPGAs and custom ASICs. We also show that using this architecture for comparing hardware performance of secret-key block ciphers, such as AES candidates, operating in non-feedback cipher modes, leads to the more prudent and fairer analysis than comparisons based on other types of pipelined architectures.

Cryptographic Engineering | 2009

FPGA and ASIC Implementations of AES

Kris Gaj; Pawel Chodowiec

In 1997, an effort was initiated to develop a new American encryption standard to be commonly used well into the next century. This new standard was given a name AES, Advanced Encryption Standard. A new algorithm was selected through a contest organized by the National Institute of Standards and Technology (NIST). By June 1998, 15 candidate algorithms had been submitted to NIST by research groups from all over the world. After the first round of analysis was concluded in August 1999, the number of candidates was reduced to final five. In October 2000, NIST announced its selection of Rijndael [7] as a winner of the AES contest. The official standard was published in November 2001 as FIPS (Federal Information Processing Standard) number 197 [1]. The primary criteria used by NIST to evaluate AES candidates included security, efficiency in software and hardware, and flexibility. In the absence of any major breakthroughs in the cryptanalysis of the final five candidates, and because of the relatively inconclusive results of their software performance evaluations, hardware efficiency evaluations presented during the third AES conference provided a very substantial quantitative measure that clearly differentiated AES candidates among each other [9, 10, 12, 17, 21, 42]. The importance of this measure was reflected by a survey performed among the participants of the AES conference, in which the ranking of the candidate algorithms coincided very well with their relative speed in hardware [16, 18]. The AES evaluation process resulted in the first efficient hardware architectures for AES. The university groups contributed first implementations of AES based on FPGAs (field programmable gate arrays) [5, 9, 11, 18]. The National Security Agency group and industry groups provided the first implementations targeting ASICs (application-specific integrated circuits) [21, 42].

cryptographic hardware and embedded systems | 2010

Fair and comprehensive methodology for comparing hardware performance of fourteen round two SHA-3 candidates using FPGAs

Kris Gaj; Ekawat Homsirikamol; Marcin Rogawski

Performance in hardware has been demonstrated to be an important factor in the evaluation of candidates for cryptographic standards. Up to now, no consensus exists on how such an evaluation should be performed in order to make it fair, transparent, practical, and acceptable for the majority of the cryptographic community. In this paper, we formulate a proposal for a fair and comprehensive evaluation methodology, and apply it to the comparison of hardware performance of 14 Round 2 SHA-3 candidates. The most important aspects of our methodology include the definition of clear performance metrics, the development of a uniform and practical interface, generation of multiple sets of results for several representative FPGA families from two major vendors, and the application of a simple procedure to convert multiple sets of results into a single ranking.

IEEE Transactions on Computers | 2011

New Hardware Architectures for Montgomery Modular Multiplication Algorithm

Miaoqing Huang; Kris Gaj; Tarek A. El-Ghazawi

Montgomery modular multiplication is one of the fundamental operations used in cryptographic algorithms, such as RSA and Elliptic Curve Cryptosystems. At CHES 1999, Tenca and Koç proposed the Multiple-Word Radix-2 Montgomery Multiplication (MWR2MM) algorithm and introduced a now-classic architecture for implementing Montgomery multiplication in hardware. With parameters optimized for minimum latency, this architecture performs a single Montgomery multiplication in approximately 2n clock cycles, where n is the size of operands in bits. In this paper, we propose two new hardware architectures that are able to perform the same operation in approximately n clock cycles with almost the same clock period. These two architectures are based on precomputing partial results using two possible assumptions regarding the most significant bit of the previous word. These two architectures outperform the original architecture of Tenca and Koç in terms of the product latency times area by 23 and 50 percent, respectively, for several most common operand sizes used in cryptography. The architecture in radix-2 can be extended to the case of radix-4, while preserving a factor of two speedup over the corresponding radix-4 design by Tenca, Todorov, and Koç from CHES 2001. Our optimization has been verified by modeling it using Verilog-HDL, implementing it on Xilinx Virtex-II 6000 FPGA, and experimentally testing it using SRC-6 reconfigurable computer.

Explore More