Marc Feeley | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marc Feeley is active.

Explore More

Publication

Featured researches published by Marc Feeley.

international conference on functional programming | 1996

Storage use analysis and its applications

Manuel Serrano; Marc Feeley

In this paper we present a new program analysis method which we call Storage Use Analysis. This analysis deduces how objects are used by the program and allows the optimization of their allocation. This analysis can be applied to both statically typed languages (e.g. ML) and latently typed languages (e.g. Scheme). It handles side-effects, higher order functions, separate compilation and does not require CPS transformation. We show the application of our analysis to two important optimizations: stack allocation and unboxing. The first optimization replaces some heap allocations by stack allocations for user and system data storage (e.g. lists, vectors, procedures). The second optimization avoids boxing some objects. This analysis and associated optimitations have been implemented in the Bigloo Scheme/ML compiler. Experimental results show that for many allocation intensive programs we get a significant speedup. In particular, numerically intensive programs are almost 20 times faster because floating point numbers are unboxed and no longer heap allocated.

international symposium on memory management | 1998

A compacting incremental collector and its performance in a production quality compiler

Martin Larose; Marc Feeley

We present a new near-real-time compacting collector and its implementation in a production quality Scheme compiler (Gambit-C). Our goal is to use this system as a base for an implementation of Erlang for writing soft real-time telecommunication applications. We start with a description of Gambit-Cs memory organisation and its blocking collector. The design and integration of the incremental collector within Gambit-C are then explained. Finally we measure the performance of the incremental collector and compare it to the original blocking collector. We found that the overhead of the incremental collector is high (a factor of 1.3 to 8.1, with a median of 2.24) but nevertheless the collection pauses are compatible with typical soft real-time requirements (we get an average pause of 2.9 milliseconds and a maximum pause of 15 milliseconds on a 133Mhz DEC Alpha 21064).

international conference on functional programming | 1993

Polling efficiently on stock hardware

Marc Feeley

Two strategies for supporting rwynchronous interrupts are: the use of the processor’s hardware interrupt system and the use of polling. The advantages of polling include: portabdity, simplicity, and low cost for handling interrupts. Unfortunately, polling has an overhead for the explicit interrupt checks inserted in the code. This paper describes balanced polling, a method for placing the interrupt checks which has a low overhead and also guarantees an upper bound on interrupt latency. This method has been used by Gambit (an optimizing native code compiler for Scheme) to support a number of features including multiprocessing and stack overflow detection. The overhead of balanced polling is less than for ca/J-return polling which places interrupt checksat every procedure entry and exit. The overhead of call-return polling is typically 70% larger (but sometimes over 40070 larger) than theoverhead of balanced polling.

Proceedings of the US/Japan Workshop on Parallel Symbolic Computing: Languages, Systems, and Applications | 1992

A Message Passing Implementation of Lazy Task Creation

Marc Feeley

This paper describes an implementation technique for Multilisps future construct aimed at large shared-memory multiprocessors. The technique is a variant of lazy task creation. The original implementation of lazy task creation described in [Mohr, 199l] relies on efficient shared memory to distribute tasks between processors. In contrast, we propose a task distribution method based on a message passing paradigm. Its main advantages are that it is simpler to implement, has a lower cost for locally run tasks, and allows full caching of the stack on cache incoherent machines. Benchmarks on a 32 processor BBN TC2000 show that our method is more efficient than the original implementation by as much as a factor of 2.

international conference on functional programming | 1990

A parallel virtual machine for efficient scheme compilation

Marc Feeley; James S. Miller

Programs compiled by Gambit, our Scheme compiler, achieve performance as much as twice that of the fastest available Scheme compilers. Gambit is easily ported, while retaining its high performance, through the use of a simple virtual machine (PVM). PVM allows a wide variety of machine-independent optimizations and it supports parallel computation based on the future construct. PVM conveys high-level information bidirectionally between the machine-independent front end of the compiler and the machine-dependent back end, making it easy to implement a number of common back end optimizations that are difficult to achieve for other virtual machines. PVM is similar to many real computer architectures and has an option to efficiently gather dynamic measurements of virtual machine usage. These measurements can be used in performance prediction for ports to other architectures as well as design decisions related to proposed optimizations and object representations.

Computer Languages | 1987

Using closures for code generation

Marc Feeley; Guy Lapalme

Abstract This paper describes a new approach to compiling which is based on the extensive use of closures. In this method, a compiled expression is embodied by a closure whose application performs the evaluation of the given expression. For each primitive construct contained in the expression to compile, a closure is generated. As a whole, the compiled expression consists of a network of these closures. In a way, ‘code generation’ is replaced by ‘closure generation’. This method, combined with an efficient closure implementation, produces compiled code which compares favorably (in execution time) with its interpreted counter-part. It can also be used to implement compilers for embedded languages and as it has been implemented in Scheme, it yields a straightforward metacircular compiler for Scheme.

Acta Informatica | 2000

Efficiently building a parse tree from a regular expression

Danny Dubé; Marc Feeley

Abstract. We show in this paper that parsing with regular expressions instead of context-free grammars, when it is possible, is desirable. We present efficient algorithms for performing different tasks that concern parsing: producing the external representation and the internal representation of parse trees; producing all possible parse trees or a single one. Each of our algorithms to produce a parse tree from an input string has an optimal time complexity, linear with the length of the string. Moreover, ambiguous regular expressions can be used.

Science of Computer Programming | 2005

Generation of fast interpreters for Huffman compressed bytecode

Mario Latendresse; Marc Feeley

Embedded systems often have severe memory constraints requiring careful encoding of programs. For example, smart cards have on the order of 1K of RAM, 16K of non-volatile memory, and 24K of ROM. A virtual machine can be an effective approach to obtain compact programs but instructions are commonly encoded using one byte for the opcode and multiple bytes for the operands, which can be wasteful and thus limit the size of programs runnable on embedded systems. Our approach uses canonical Huffman codes to generate compact opcodes with custom-sized operand fields and with a virtual machine that directly executes this compact code. We present techniques to automatically generate the new instruction formats and the decoder. In effect, this automatically creates both an instruction set for a customized virtual machine and an implementation of that machine. We demonstrate that, without prior decompression, fast decoding of these virtual compressed instructions is feasible. Through experiments on Scheme and Java, we demonstrate the speed of these decoders. Java benchmarks show an average execution slowdown of 9%. The reductions in size highly depend on the original bytecode and the training samples, but typically vary from 40% to 60%.

Higher-order and Symbolic Computation \/ Lisp and Symbolic Computation | 1994

Using Multilisp for solving constraint satisfaction problems: an application to nucleic acid 3D structure determination

Marc Feeley; Marcel Turcotte; Guy Lapalme

This paper describes and evaluates a parallel program for determining the three-dimensional structure of nucleic acids. A parallel constraint satisfaction algorithm is used to search a discrete space of shapes. Using two realistic data sets, we compare a previous sequential version of the program written in Miranda to the new sequential and parallel versions written in C, Scheme, and Multilisp, and explain how these new versions were designed to attain good absolute performance. Critical issues were the performance of floating-point operations, garbage collection, load balancing, and contention for shared data. We found that speedup was dependent on the data set. For the first data set, nearly linear speedup was observed for up to 64 processors whereas for the second the speedup was limited to a factor of 16.

compiler construction | 2008

Hardware JIT compilation for off-the-shelf dynamically reconfigurable FPGAs

Etienne Bergeron; Marc Feeley; Jean Pierre David

JIT compilation is a model of execution which translates at run time critical parts of the program to a low level representation. Typically a JIT compiler produces machine code from an intermediate bytecode representation. This paper considers a hardware JIT compiler targeting FPGAs, which are digital circuits configurable as needed to implement application specific circuits. Recent FPGAs in the Xilinx Virtex family are particularly attractive for hardware JIT because they are reconfigurable at run time, they contain both CPUs and reconfigurable logic, and their architecture strikes a balance of features. In this paper we discuss the design of a hardware architecture and compiler able to dynamically enhance the instruction set with hardware specialized instructions. A prototype system based on the Xilinx Virtex family supporting hardware JIT compilation is described and evaluated.

Explore More