Milena Milenkovic | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Milena Milenkovic is active.

Explore More

Publication

Featured researches published by Milena Milenkovic.

acm southeast regional conference | 2004

Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite

Hussein R. Al-Zoubi; Aleksandar Milenkovic; Milena Milenkovic

Replacement policy, one of the key factors determining the effectiveness of a cache, becomes even more important with latest technological trends toward highly associative caches. The state-of-the-art processors employ various policies such as Random, Least Recently Used (LRU), Round-Robin, and PLRU (Pseudo LRU), indicating that there is no common wisdom about the best one. Optimal yet unattainable policy would replace cache memory block whose next reference is the farthest away in the future, among all memory blocks present in the set.In our quest for replacement policy as close to optimal as possible, we thoroughly explored the design space of existing replacement mechanisms using SimpleScalar toolset and SPEC CPU2000 benchmark suite, across wide range of cache sizes and organizations. In order to better understand the behavior of different policies, we introduced new measures, such as cumulative distribution of cache hits in the LRU stack. We also dynamically monitored the number of cache misses, per each 100000 instructions.Our results show that the PLRU techniques can approximate and even outperform LRU with much lower complexity, for a wide range of cache organizations. However, a relatively large gap between LRU and optimal replacement policy, of up to 50%, indicates that new research aimed to close the gap is necessary. The cumulative distribution of cache hits in the LRU stack indicates a very good potential for way prediction using LRU information, since the percentage of hits to the bottom of the LRU stack is relatively high.

southeastern symposium on system theory | 2002

An accelerometer-based physical rehabilitation system

Milena Milenkovic; Emil Jovanov; J. Chapman; Dejan Raskovic; J. Price

This paper presents a portable physical rehabilitation monitoring system based on a personal network of intelligent sensors. Rehabilitation is traditionally carried out in hospitals under supervision of qualified personnel. However, significantly better results could be achieved using out-of-hospital portable monitoring to allow patients computer-assisted rehabilitation in their homes. The new generation of personal digital assistants (PDA), such as Compaq iPAQ, offers large processing power, decent graphical user interface, and compact flash based secondary memory. Therefore, they are perfectly suited for portable monitoring units. Individual sensors are positioned on limbs to analyze movements using 2-axis MEMS accelerometers. The system monitors periods and forces of individual sensors, visualizing relevant physiological data in real-time on PDA, and archiving progress data on compact flash. A specialist supervises current advance and sets new optimum rehabilitation modes, thresholds for forces, step periods, etc. The system generates real-time warnings when predefined thresholds have been exceeded. We are developing a system for hip and knee replacement rehabilitation, as well as general physical rehabilitation. Other possible applications of our system include rehabilitation of stroke and heart attack patients.

southeastern symposium on system theory | 2005

An environment for runtime power monitoring of wireless sensor network platforms

Aleksandar Milenkovic; Milena Milenkovic; Emil Jovanov; Dennis Hite; Dejan Raskovic

Wireless sensor networks emerged as a key technology for prolonged, unsupervised monitoring in a wide spectrum of applications, from biological and environmental to civil and military. The sensor networks should operate autonomously for a long period of time under stringent resource and energy constraints. Energy conservation and power-awareness have become a focus of a number of research efforts, as sensor network nodes must operate on batteries or use energy extracted from the environment, such as solar energy or vibrations. Runtime power measurements and characterization of real existing systems are crucial for studies that target power optimizations, including techniques for dynamic adaptation based on the current energy status. This paper introduces an environment for unobtrusive real-time power monitoring that could be used for a number of wireless sensor platforms. We describe our methodology for calibration and validation of the environment and give empirical data for the Telos wireless sensor platform when it runs a subset of representative applications.

compilers, architecture, and synthesis for embedded systems | 2005

Hardware support for code integrity in embedded processors

Milena Milenkovic; Aleksandar Milenkovic; Emil Jovanov

Computer security becomes increasingly important with continual growth of the number of interconnected computing platforms. Moreover, as capabilities of embedded processors increase, the applications running on these systems also grow in size and complexity, and so does the number of security vulnerabilities. Attacks that impair code integrity by injecting and executing malicious code are one of the major security issues. This problem can be addressed at different levels, from more secure software and operating systems, down to solutions that require hardware support. Most of the existing techniques tackle the problem of security flaws at the software level, but this approach lacks generality and often induces prohibitive overhead in performance and cost, or generates a significant number of false alarms. On the other hand, a further increase in the number of transistors on a single chip enables integrated hardware support for functions that formerly were restricted to the software domain. Hardware-supported defense techniques have the potential to be more general and more efficient than solely software solutions. This paper proposes four new architectural extensions to ensure complete run-time code integrity using instruction block signature verification. The experimental analysis shows that the proposed techniques have low performance and energy overhead. In addition, the proposed mechanism has low hardware complexity, and does not impose either changes to the compiler or changes to the existing instruction set architecture.

international conference on communications | 2003

Exploiting streams in instruction and data address trace compression

Aleksandar Milenkovic; Milena Milenkovic

Novel research ideas in computer architecture are frequently evaluated using trace-driven simulation. The large size of traces incited different techniques for trace reduction. These techniques often combine standard compression algorithms with trace-specific solutions, taking into account the tradeoff between reduction in the trace size and simulation slowdown due to compression. This paper introduces SBC, a new algorithm for instruction and data address trace compression based on instruction streams. The proposed technique significantly reduces trace size and simulation time, and can be successfully combined with general compression algorithms. The SBC technique combined with gzip reduces the size of SPEC CPU2000 traces 59-97930 times, and combined with Sequitur 65-185599 times.

ACM Transactions on Modeling and Computer Simulation | 2007

An efficient single-pass trace compression technique utilizing instruction streams

Aleksandar Milenkovic; Milena Milenkovic

Trace-driven simulations have been widely used in computer architecture for quantitative evaluations of new ideas and design prototypes. Efficient trace compression and fast decompression are crucial for contemporary workloads, as representative benchmarks grow in size and number. This article presents Stream-Based Compression (SBC), a novel technique for single-pass compression of address traces. The SBC technique compresses both instruction and data addresses by associating them with a particular instruction stream, that is, a block of consecutively executing instructions. The compressed instruction trace is a trace of instruction stream identifiers. The compressed data address trace encompasses the data address stride and the number of repetitions for each memory-referencing instruction in a stream, ordered by the corresponding stream appearances in the trace. SBC reduces the size of SPEC CPU2000 Dinero instruction and data address traces from 18 to 309 times, outperforming the best trace compression techniques presented in the open literature. SBC can be successfully combined with general-purpose compression techniques. The combined SBC-gzip compression ratio is from 80 to 35,595, and the SBC-bzip2 compression ratio is from 75 to 191,257. Moreover, SBC outperforms other trace compression techniques when both decompression time and compression time are considered. This article also shows how the SBC algorithm can be modified for hardware implementation with very modest resources and only a minor loss in compression ratio.

ACM Sigarch Computer Architecture News | 2005

Using instruction block signatures to counter code injection attacks

Milena Milenkovic; Aleksandar Milenkovic; Emil Jovanov

With more computing platforms connected to the Internet each day, computer system security has become a critical issue. One of the major security problems is execution of malicious injected code. In this paper we propose new processor extensions that allow execution of trusted instructions only. The proposed extensions verify instruction block signatures in run-time. Signatures are generated during a trusted installation process, using a multiple input signature register (MISR), and stored in an encrypted form. The coefficients of the MISR and the key used for signature encryption are based on a hidden processor key. Signature verification is done in the background, concurrently with program execution, thus reducing negative impact on performance. The preliminary results indicate that the proposed processor extensions will prevent execution of any unauthorized code at a relatively small increase in system complexity and execution time.

IEEE Computer Architecture Letters | 2002

Page-Level Behavior of Cache Contention

Aleksandar Milenkovic; Milena Milenkovic

Cache misses in small, limited-associativityprimary caches very often replace live cache blocks, giventhe dominance of capacity and conflict misses. Towardsmotivating novel cache organizations, we study thecomparative characteristics of the virtual memoryaddress pairs involved in typical primary-cachecontention (block replacements) for the SPEC2000integer benchmarks. We focus on the cache tag bits, andresults show that (i) often just a few tag bits differbetween contending addresses, and (ii) accesses to certainsegments or page groups of the virtual address space (i.e.certain tag-bit groups) contend frequently. Cacheconsciousvirtual address space allocation can furtherreduce the number of conflicting tag bits. We mentiontwo directions for exploiting such page-level contentionpatterns to improve cache cost and performance.Cache misses in small, limited-associativityprimary caches very often replace live cache blocks, giventhe dominance of capacity and conflict misses. Towardsmotivating novel cache organizations, we study thecomparative characteristics of the virtual memoryaddress pairs involved in typical primary-cachecontention (block replacements) for the SPEC2000integer benchmarks. We focus on the cache tag bits, andresults show that (i) often just a few tag bits differbetween contending addresses, and (ii) accesses to certainsegments or page groups of the virtual address space (i.e.certain tag-bit groups) contend frequently. Cacheconsciousvirtual address space allocation can furtherreduce the number of conflicting tag bits. We mentiontwo directions for exploiting such page-level contentionpatterns to improve cache cost and performance.

IEEE Transactions on Computers | 2011

Caches and Predictors for Real-Time, Unobtrusive, and Cost-Effective Program Tracing in Embedded Systems

Aleksandar Milenkovic; Vladimir Uzelac; Milena Milenkovic; Martin Burtscher

The increasing complexity of modern embedded computer systems makes software development and system verification the most critical steps in system development. To expedite verification and program debugging, chip manufacturers increasingly consider hardware infrastructure for program debugging and tracing, including logic to capture and filter traces, buffers to store traces, and a trace port through which the trace is read by the debug tools. In this paper, we introduce a new approach to capture and compress program execution traces in hardware. The proposed trace compressor encompasses two cost-effective structures, a stream descriptor cache, and a last stream predictor. Information about the program flow is translated into a sequence of hit and miss events in these structures, thus dramatically reducing the number of bits that need to be sent out of the chip. We evaluate the efficiency of the proposed mechanism by measuring the trace port bandwidth on a set of benchmark programs. Our mechanism requires only 0.15 bits/instruction/CPU on average on the trace port, which is a sixfold improvement over state-of-the-art commercial solutions. The trace compressor requires an on-chip area that is equivalent to one third of a 1 kilobyte cache and it allows for continual and unobtrusive program tracing in real time.

compilers, architecture, and synthesis for embedded systems | 2010

Real-time unobtrusive program execution trace compression using branch predictor events

Vladimir Uzelac; Aleksandar Milenkovic; Martin Burtscher; Milena Milenkovic

Unobtrusive capturing of program execution traces in real-time is crucial in debugging cyber-physical systems. However, tracing even limited program segments is often cost-prohibitive, requiring wide trace ports and large on-chip trace buffers. This paper introduces a new cost-effective technique for capturing and compressing program execution traces in real time. It uses branch predictor-like structures in the trace module to losslessly compress the traces. This approach results in high compression ratios because it only has to transmit misprediction events to the software debugger. Coupled with an effective variable encoding scheme, our technique requires merely 0.036 bits/instruction of trace port bandwidth (a 28-fold improvement over the commercial state-of-the-art) at a cost of roughly 5,200 logic gates.

Explore More