Mikel Luján | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mikel Luján is active.

Explore More

Publication

Featured researches published by Mikel Luján.

international conference on parallel processing | 2008

DiSTM: A Software Transactional Memory Framework for Clusters

Christos Kotselidis; Mohammad Ansari; Kim Jarvis; Mikel Luján; Chris C. Kirkham; Ian Watson

While transactional memory (TM) research on shared-memory chip multiprocessors has been flourishing over the last years,limited research has been conducted in the cluster domain. In this paper,we introduce a research platform for exploiting software TMon clusters. The distributed software transactional memory (DiSTM) system has been designed for easy prototyping of TM coherence protocols and it does not rely on a software or hardware implementation of distributed shared memory. Three TM coherence protocols have been implemented and evaluated with established TM benchmarks. The decentralized transactional coherence and consistency protocol has been compared against two centralized protocols that utilize leases. Results indicate that depending on network congestion and amount of contention different protocols perform better.

high performance embedded architectures and compilers | 2008

Steal-on-Abort: Improving Transactional Memory Performance through Dynamic Transaction Reordering

Mohammad Ansari; Mikel Luján; Christos Kotselidis; Kim Jarvis; Chris C. Kirkham; Ian Watson

In transactional memory, aborted transactions reduce performance, and waste computing resources. Ideally, concurrent execution of transactions should be optimally ordered to minimise aborts, but such an ordering is often either complex, or unfeasible, to obtain. This paper introduces a new technique called steal-on-abort , which aims to improve transaction ordering at runtime. Suppose transactions A and B conflict, and B is aborted. In general it is difficult to predict this first conflict, but once observed, it is logical not to execute the two transactions concurrently again. In steal-on-abort, the aborted transaction B is stolen by its opponent transaction A, and queued behind A to prevent concurrent execution of A and B. Without steal-on-abort, transaction B would typically have been restarted immediately, and possibly had a repeat conflict with transaction A. Steal-on-abort requires no application-specific information, modification, or offline pre-processing. In this paper, it is evaluated using a sorted linked list, red-black tree, STAMP-vacation, and Lee-TM. The evaluation reveals steal-on-abort is highly effective at eliminating repeat conflicts, which reduces the amount of computing resources wasted, and significantly improves performance.

international conference on algorithms and architectures for parallel processing | 2008

Lee-TM: A Non-trivial Benchmark Suite for Transactional Memory

Mohammad Ansari; Christos Kotselidis; Ian Watson; Chris C. Kirkham; Mikel Luján; Kim Jarvis

Transactional Memory (TM) is a concurrent programming paradigm that aims to make concurrent programming easier than fine-grain locking, whilst providing similar performance and scalability. Several TM systems have been made available for research purposes. However, there is a lack of a wide range of non-trivial benchmarks with which to thoroughly evaluate these TM systems. This paper introduces Lee-TM, a non-trivial and realistic TM benchmark suite based on Lees routing algorithm. The benchmark suite provides sequential, lock-based, and transactional implementations to enable direct performance comparison. Lees routing algorithm has several of the desirable properties of a non-trivial TM benchmark, such as large amounts of parallelism, complex contention characteristics, and a wide range of transaction durations and lengths. A sample evaluation shows unfavourable transactional performance and scalability compared to lock-based execution, in contrast to much of the published TM evaluations, and highlights the need for non-trivial TM benchmarks.

international conference on parallel architectures and compilation techniques | 2007

A Study of a Transactional Parallel Routing Algorithm

Ian Watson; Chris C. Kirkham; Mikel Luján

Transactional memory proposes an alternative synchronization primitive to traditional locks. Its promise is to simplify the software development of multi-threaded applications while at the same time delivering the performance of parallel applications using (complex and error prone) fine grain locking. This study reports our experience implementing a realistic application using transactional memory (TM). The application is Lees routing algorithm and was selected for its abundance of parallelism but difficulty of expressing it with locks. Each route between a source and a destination point in a grid can be considered a unit of parallelism. Starting from this simple approach, we evaluate the exploitable parallelism of a transactional parallel implementation and explore how it can be adapted to deliver better performance. The adaptations do not introduce locks nor alter the essence of the implemented algorithm, but deliver up to 20 times more parallelism. The adaptations are derived from understanding the application itself and TM. The evaluation simulates an abstracted TM system and, thus, the results are independent of specific software or hardware TM implemented, and describe properties of the application.

Microprocessors and Microsystems | 2014

TERAFLUX: Harnessing dataflow in next generation teradevices

Roberto Giorgi; Rosa M. Badia; François Bodin; Albert Cohen; Paraskevas Evripidou; Paolo Faraboschi; Bernhard Fechner; Guang R. Gao; Arne Garbade; Rahulkumar Gayatri; Sylvain Girbal; Daniel Goodman; Behram Khan; Souad Koliai; Joshua Landwehr; Nhat Minh Lê; Feng Li; Mikel Luján; Avi Mendelson; Laurent Morin; Nacho Navarro; Tomasz Patejko; Antoniu Pop; Pedro Trancoso; Theo Ungerer; Ian Watson; Sebastian Weis; Stéphane Zuckerman; Mateo Valero

The improvements in semiconductor technologies are gradually enabling extreme-scale systems such as teradevices (i.e., chips composed by 1000 billion of transistors), most likely by 2020. Three major challenges have been identified: programmability, manageable architecture design, and reliability. TERAFLUX is a Future and Emerging Technology (FET) large-scale project funded by the European Union, which addresses such challenges at once by leveraging the dataflow principles. This paper presents an overview of the research carried out by the TERAFLUX partners and some preliminary results. Our platform comprises 1000+ general purpose cores per chip in order to properly explore the above challenges. An architectural template has been proposed and applications have been ported to the platform. Programming models, compilation tools, and reliability techniques have been developed. The evaluation is carried out by leveraging on modifications of the HP-Labs COTSon simulator.

international conference on robotics and automation | 2015

Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM

Luigi Nardi; Bruno Bodin; M. Zeeshan Zia; John Mawer; Andy Nisbet; Paul H. J. Kelly; Andrew J. Davison; Mikel Luján; Michael F. P. O'Boyle; Graham D. Riley; Nigel P. Topham; Stephen B. Furber

Real-time dense computer vision and SLAM offer great potential for a new level of scene modelling, tracking and real environmental interaction for many types of robot, but their high computational requirements mean that use on mass market embedded platforms is challenging. Meanwhile, trends in low-cost, low-power processing are towards massive parallelism and heterogeneity, making it difficult for robotics and vision researchers to implement their algorithms in a performance-portable way. In this paper we introduce SLAMBench, a publicly-available software framework which represents a starting point for quantitative, comparable and validatable experimental research to investigate trade-offs in performance, accuracy and energy consumption of a dense RGB-D SLAM system. SLAMBench provides a KinectFusion implementation in C++, OpenMP, OpenCL and CUDA, and harnesses the ICL-NUIM dataset of synthetic RGB-D sequences with trajectory and scene ground truth for reliable accuracy comparison of different implementation and algorithms. We present an analysis and breakdown of the constituent algorithmic elements of KinectFusion, and experimentally investigate their execution time on a variety of multicore and GPU-accelerated platforms. For a popular embedded platform, we also present an analysis of energy efficiency for different configuration alternatives.

international conference on supercomputing | 2009

Understanding the interconnection network of SpiNNaker

Javier Navaridas; Mikel Luján; José Miguel-Alonso; Luis A. Plana; Steve B. Furber

SpiNNaker is a massively parallel architecture designed to model large-scale spiking neural networks in (biological) real-time. Its design is based around ad-hoc multi-core System-on-Chips which are interconnected using a two-dimensional toroidal triangular mesh. Neurons are modeled in software and their spikes generate packets that propagate through the on- and inter-chip communication fabric relying on custom-made on-chip multicast routers. This paper models and evaluates large-scale instances of its novel interconnect (more than 65 thousand nodes, or over one million computing cores), focusing on real-time features and fault-tolerance. The key contribution can be summarized as understanding the properties of the feasible topologies and establishing the stable operation of the SpiNNaker under different levels of degradation. First we derive analytically the topological characteristics of the network, which are later confirmed by experimental work. With the computational model developed, we investigate the topology of SpiNNaker, and compare it with a standard 3-dimensional torus. The novel emergency routing mechanism, implemented within the routers, allows the topology of SpiNNaker to be more robust than the 3-dimensional torus, regardless of the latter having better topological characteristics. Furthermore, we obtain optimal values of two router parameters related with livelock and deadlock avoidance mechanisms.

european conference on parallel processing | 2008

Advanced Concurrency Control for Transactional Memory Using Transaction Commit Rate

Mohammad Ansari; Christos Kotselidis; Kim Jarvis; Mikel Luján; Chris C. Kirkham; Ian Watson

Concurrency control for Transactional Memory (TM) is investigated as a means for improving resource usage by adjusting dynamically the number of concurrently executing transactions. The proposed control system takes as feedback the measured Transaction Commit Rateto adjust the concurrency. Through an extensive evaluation, a new Concurrency Control Algorithm (CCA), called P-only Concurrency Control (PoCC), is shown to perform better than our other four proposed CCAs for a synthetic benchmark, and the STAMP and Lee-TM benchmarks.

field programmable logic and applications | 2014

An empirical evaluation of High-Level Synthesis languages and tools for database acceleration

Oriol Arcas-Abella; Geoffrey Ndu; Nehir Sonmez; Mohsen Ghasempour; Adrià Armejach; Javier Navaridas; Wei Song; John Mawer; Adrián Cristal; Mikel Luján

High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to make FPGAs more accessible to software developers. Nevertheless, picking the most suitable HLS for a certain class of algorithms depends on requirements such as area and throughput, as well as on programmer experience. In this paper, we explore the different trade-offs present when using a representative set of HLS tools in the context of Database Management Systems (DBMS) acceleration. More specifically, we conduct an empirical analysis of four representative frameworks (Bluespec SystemVerilog, Altera OpenCL, LegUp and Chisel) that we utilize to accelerate commonly-used database algorithms such as sorting, the median operator, and hash joins. Through our implementation experience and empirical results for database acceleration, we conclude that the selection of the most suitable HLS depends on a set of orthogonal characteristics, which we highlight for each HLS framework.

digital systems design | 2013

The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices

Marco Solinas; Rosa M. Badia; François Bodin; Albert Cohen; Paraskevas Evripidou; Paolo Faraboschi; Bernhard Fechner; Guang R. Gao; Arne Garbade; Sylvain Girbal; Daniel Goodman; Behran Khan; Souad Koliai; Feng Li; Mikel Luján; Laurent Morin; Avi Mendelson; Nacho Navarro; Antoniu Pop; Pedro Trancoso; Theo Ungerer; Mateo Valero; Sebastian Weis; Ian Watson; Stéphane Zuckermann; Roberto Giorgi

Thanks to the improvements in semiconductor technologies, extreme-scale systems such as teradevices (i.e., composed by 1000 billion of transistors) will enable systems with 1000+ general purpose cores per chip, probably by 2020. Three major challenges have been identified: programmability, manageable architecture design, and reliability. TERAFLUX is a Future and Emerging Technology (FET) large-scale project funded by the European Union, which addresses such challenges at once by leveraging the dataflow principles. This paper describes the project and provides an overview of the research carried out by the TERAFLUX consortium.

Explore More