Ayesha Khalid
RWTH Aachen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ayesha Khalid.
design automation conference | 2013
Khawar Shahzad; Ayesha Khalid; Zoltán Endre Rákossy; Goutam Paul; Anupam Chattopadhyay
Cryptographic coprocessors are inherent part of modern Systemon-Chips. It serves dual purpose-efficient execution of cryptographic kernels and supporting protocols for preventing IP-piracy. Flexibility in such coprocessors is required to provide protection against emerging cryptanalytic schemes and to support different cryptographic functions like encryption and authentication. In this context, a novel crypto-coprocessor, named CoARX, supporting multiple cryptographic algorithms based on Addition (A), Rotation (R) and eXclusive-or (X) operations is proposed. CoARX supports diverse ARX-based cryptographic primitives. We show that compared to dedicated hardware implementations and general-purpose microprocessors, it offers excellent performance-flexibility trade-off including adaptability to resist generic cryptanalysis.
Cryptography and Communications | 2013
Sourav Sen Gupta; Anupam Chattopadhyay; Ayesha Khalid
To date, the basic idea for implementing stream ciphers has been confined to individual standalone designs. In this paper, we introduce the notion of integrated implementation of multiple stream ciphers within a single architecture, where the goal is to achieve area and throughput efficiency by exploiting the structural similarities of the ciphers at an algorithmic level. We present two case studies to support our idea. First, we propose the merger of SNOW 3G and ZUC stream ciphers, which constitute a part of the 3GPP LTE-Advanced security suite. We propose HiPAcc-LTE, a high performance integrated design that combines the two ciphers in hardware, based on their structural similarities. The integrated architecture reduces the area overhead significantly compared to two distinct cores, and also provides almost double throughput in terms of keystream generation, compared with the state-of-the-art implementations of the individual ciphers. As our second case study, we present IntAcc-RCHC, an integrated accelerator for the stream ciphers RC4 and HC-128. We show that the integrated accelerator achieves a slight reduction in area without any loss in throughput compared to our standalone implementations. We also achieve at least 1.5 times better throughput compared to general purpose processors. Long term vision of this hardware integration approach for cryptographic primitives is to build a flexible core supporting multiple designs having similar algorithmic structures.
international symposium on circuits and systems | 2012
Anupam Chattopadhyay; Ayesha Khalid; Subhamoy Maitra; Shashwat Raizada
Due to ubiquitous deployment of embedded systems, security and privacy are emerging as major design concerns and new stream ciphers are being proposed by the cryptographic community. HC-128 is one of the recent stream ciphers that received attention after its selection as an eStream candidate. Till date, the cipher is believed to have a good security margin. In this paper we study several implementation issues for HC-128 in a disciplined manner. We first discuss the experience on embedded and customizable processors. Then we consider a dedicated hardware accelerator implementation. Further we explore several parallelization strategies for improving throughput. To the best of our knowledge such a detailed implementation exercise has not been presented in the literature. Our novel implementation strategies mark the fastest HC-128 execution reported till date.
IEEE Transactions on Computers | 2018
James Howe; Ayesha Khalid; Ciara Rafferty; Francesco Regazzoni; Maire O'Neill
Lattice-based cryptography is one of the most promising branches of quantum resilient cryptography, offering versatility and efficiency. Discrete Gaussian samplers are a core building block in most, if not all, lattice-based cryptosystems, and optimised samplers are desirable both for high-speed and low-area applications. Due to the inherent structure of existing discrete Gaussian sampling methods, lattice-based cryptosystems are vulnerable to side-channel attacks, such as timing analysis. In this paper, the first comprehensive evaluation of discrete Gaussian samplers in hardware is presented, targeting FPGA devices. Novel optimised discrete Gaussian sampler hardware architectures are proposed for the main sampling techniques. An independent-time design of each of the samplers is presented, offering security against side-channel timing attacks, including the first proposed constant-time Bernoulli, Knuth-Yao, and discrete Ziggurat sampler hardware designs. For a balanced performance, the Cumulative Distribution Table (CDT) sampler is recommended, with the proposed hardware CDT design achieving a throughput of 59.4 million samples per second for encryption, utilising just 43 slices on a Virtex 6 FPGA and 16.3 million samples per second for signatures with 179 slices on a Spartan 6 device.
international conference on information systems security | 2013
Ayesha Khalid; Anupam Chattopadhyay; Goutam Paul
In this paper we propose RAPID-FeinSPN, an extensible framework designed for rapid prototyping of Feistel Network and Substitution-Permutation Network SPN based symmetric ciphers. The framework tries to bridge the gap between the designer of cryptographic schemes and the VLSI implementation engineers of that cryptographic systems. Using a GUI-based interface the user has the freedom either to choose a well-known Feistel or SPN based cryptosystem for implementation or to specify the configuration of a new cipher. RAPID-FeinSPN supports multiple configurations of cryptographic settings and using the modular design principles generates a customized C code as well as a customized hardware implementation without significant performance degradation. This approach allows a quick hardware resource estimation, early functional validation of desirable cipher properties and can be used for benchmarking various design parameters of a cipher that vary in terms of security, complexity or both for a security-throughput trade-off. We have implemented some well known block ciphers using RAPID-FeinSPN and benchmarked the performance against software as well as hardware implementations.
high performance switching and routing | 2013
Ayesha Khalid; Rajat Sen; Anupam Chattopadhyay
Finite automata is widely used for Deep Packet Inspection (DPI) of network traffic. Two types of automata employed for this purpose are Non-deterministic Finite Automata (NFA) and Deterministic Finite Automata (DFA). An NFA suffers from a large memory bandwidth per character due to multiple active states. A DFA, in comparison, ensures a linear processing time of O(1) for memory based architectures. However, the DFA state explosion conditions commonly occurring in todays NIDS rule-sets, render the automata with practically infeasible memory space requirements. To avoid state blowup we propose a semi-deterministic automata, Sub-expression Integrated DFA (SI-DFA), that ensures processing time of a single standard DFA. Rules are broken into sub-expressions at blowup conditions and compiled into a single DFA along with an association table, to correctly encapsulate equivalent automata. We list the rare cases in regular expressions for which sub-expression Integration is incorrect and present methodology to detect their occurrences. We evaluate SI-DFA on real-world rule-sets like Bro, Snort and Linux filters and compare their performance with the state-of-the-art hybrid automata solutions. SI-DFA renders a 66-97% reduction in processing bandwidth, up to 68% lower space requirement and an improvement trend with increasing rule complexity when compared to the traditional solutions.
international conference on information security and cryptology | 2012
Ayesha Khalid; Deblin Bagchi; Goutam Paul; Anupam Chattopadhyay
The ease of programming offered by the CUDA programming model attracted a lot of programmers to try the platform for acceleration of many non-graphics applications. Cryptography, being no exception, also found its share of exploration efforts, especially block ciphers. In this contribution we present a detailed walk-through of effective mapping of HC-128 and HC-256 stream ciphers on GPUs. Due to inherent inter-S-Box dependencies, intra-S-Box dependencies and a high number of memory accesses per keystream word generation, parallelization of HC series of stream ciphers remains challenging. For the first time, we present various optimization strategies for HC-128 and HC-256 speedup in tune with CUDA device architecture. The peak performance achieved with a single data-stream for HC-128 and HC-256 is 0.95 Gbps and 0.41 Gbps respectively. Although these throughput figures do not beat the CPU performance (10.9 Gbps for HC-128 and 7.5 Gbps for HC-256), our multiple parallel data-stream implementation is benchmarked to reach approximately 31 Gbps for HC-128 and 14 Gbps for HC-256 (with 32768 parallel data-streams). To the best of our knowledge, this is the first reported effort of mapping HC-Series of stream ciphers on GPUs.
Journal of Cryptographic Engineering | 2016
Ayesha Khalid; Muhammad Hassan; Goutam Paul; Anupam Chattopadhyay
Block ciphers are the most prominent symmetric-key cryptography kernels, serving as fundamental building blocks to many other cryptographic functions. This work presents RunFein, a tool for rapid prototyping of a major class of block ciphers, namely product ciphers (including Feistel network and Substitution permutation network-based block ciphers). RunFein accepts the algorithmic configuration of an existing/new block cipher from the user through a GUI to generate a customized software implementation. The user may choose from various micro-architectural templates (unrolled, pipelined, sub-pipelined) to generate an HDL description of the cipher. Various modes of operation and the NIST test suite may also be included. This high-level design approach eliminates the laborious and repetitive development efforts for VLSI realizations of block ciphers. It allows a quick design exploration, consequently enabling fast benchmarking in terms of critical resource estimation of various versions/configurations of a cipher that varies in terms of security, complexity and performance. Using RunFein, we have successfully implemented some well-known product ciphers and benchmarked their performance without significant degradation against their published hand-crafted implementations in literature.
international conference on cryptology in india | 2014
Ayesha Khalid; Prasanna Ravi; Anupam Chattopadhyay; Goutam Paul
As today’s high performance embedded systems are heterogeneous platforms, a crisp boundary between the software and the hardware ciphers is fast getting murky. This work takes up the design of a dedicated hardware accelerator for HC-128, one of the stream ciphers in the software portfolio of eSTREAM finalists. We discuss a novel idea of splitting states kept in SRAMs into multiple smaller SRAMs and exploit the increased parallel accesses to achieve higher throughput. We optimize the accelerator design with state splitting by different factors. A detailed throughput-area-power analysis of these design points follow along with a benchmarking with the state-of-the-art for HC-128. Our implementation marks an HC-128 ASIC with the highest throughput per area performance reported in the literature till date.
international conference on cryptology in africa | 2013
Ayesha Khalid; Goutam Paul; Anupam Chattopadhyay
Since the introduction of the CUDA programming model, GPUs are considered a viable platform for accelerating non-graphical applications. Many cryptographic algorithms have been reported to achieve remarkable performance speedups, especially block ciphers. For stream ciphers, however, the lack of reported GPU acceleration endeavors is due to their inherent iterative structures that prohibit parallelization. In this paper, we propose an efficient implementation methodology for data-parallel cryptographic functions in a batch processing fashion on modern GPUs in general and optimizations for Salsa20 in particular. We present an autotuning framework to reach the most optimized set of device and application parameters for Salsa20 kernel variants with throughput maximization as a figure of merit. The peak performance achieved by our implementation for Salsa20/12 is 2.7 GBps and 43.44 GBps with and without memory transfers respectively on NVIDIA GeForce GTX 590. These figures beat the fastest reported GPU implementation of any stream cipher in the eSTREAM portfolio including Salsa20/12, as well as the block cipher AES optimized by hand-tuning, and thus, to the best of our knowledge set a new speed record.