Daniele Palossi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniele Palossi is active.

Explore More

Publication

Featured researches published by Daniele Palossi.

design, automation, and test in europe | 2016

Enabling the heterogeneous accelerator model on ultra-low power microcontroller platforms

Francesco Conti; Daniele Palossi; Andrea Marongiu; Davide Rossi; Luca Benini

The stringent power constraints of complex microcontroller based devices (e.g. smart sensors for the IoT) represent an obstacle to the introduction of sophisticated functionality. Programmable accelerators would be extremely beneficial to provide the flexibility and energy efficiency required by fast-evolving IoT applications; however, the integration complexity and sub-10mW power budgets have been considered insurmountable obstacles so far. In this paper we demonstrate the feasibility of coupling a low power microcontroller unit (MCU) with a heterogeneous programmable accelerator for speeding-up computation-intensive algorithms at an ultra-low power (ULP) sub-10mW budget. Specifically, we develop a heterogeneous architecture coupling a Cortex-M series MCU with PULP, a programmable accelerator for ULP parallel computing. Complex functionality is enabled by the support for offloading parallel computational kernels from the MCU to the accelerator using the OpenMP programming model. We prototype this platform using a STM Nucleo board and a PULP FPGA emulator. We show that our methodology can deliver up to 60× gains in performance and energy efficiency on a diverse set of applications, opening the way for a new class of ULP heterogeneous architectures.

computing frontiers | 2016

An energy-efficient parallel algorithm for real-time near-optimal UAV path planning

Daniele Palossi; Michele Furci; Roberto Naldi; Andrea Marongiu; Lorenzo Marconi; Luca Benini

We propose a shortest trajectory planning algorithm implementation for Unmanned Aerial Vehicles (UAVs) on an embedded GPU. Our goal is the development of a fast, energy-efficient global planner for multi-rotor UAVs supporting human operator during rescue missions. The work is based on OpenCL parallel non-deterministic version of the Dijkstra algorithm to solve the Single Source Shortest Path (SSSP). Our planner is suitable for real-time path re-computation in dynamically varying environments of up to 200 m2. Results demonstrate the efficacy of the approach, showing speedups of up to 74x, saving up to ~ 98% of energy versus the sequential benchmark, while reaching near-optimal path selection, keeping the average path cost error smaller than 1.2%.

design, automation, and test in europe | 2017

Ultra low-power visual odometry for nano-scale unmanned aerial vehicles

Daniele Palossi; Andrea Marongiu; Luca Benini

One of the fundamental functionalities for autonomous navigation of Unmanned Aerial Vehicles (UAVs) is the hovering capability. State-of-the-art techniques for implementing hovering on standard-size UAVs process camera stream to determine position and orientation (visual odometry). Similar techniques are considered unaffordable in the context of nano-scale UAVs (i.e. few centimeters of diameter), where the ultra-constrained power-envelopes of tiny rotor-crafts limit the onboard computational capabilities to those of low-power microcontrollers. In this work we study how the emerging ultra-low-power parallel computing paradigm could enable the execution of complex hovering algorithmic flows onto nano-scale UAVs. We provide insight on the software pipeline, the parallelization opportunities and the impact of several algorithmic enhancements. Results demonstrate that the proposed software flow and architecture can deliver unprecedented GOPS/W, achieving 117 frame-per-second within a power envelope of 10 mW.

computing frontiers | 2017

Self-Sustainability in Nano Unmanned Aerial Vehicles: A Blimp Case Study

Daniele Palossi; Andres Gomez; Stefan Draskovic; Kevin Keller; Luca Benini; Lothar Thiele

Nowadays nano Unmanned Aerial Vehicles (UAVs), such as quad-copters, have very limited flight times, tens of minutes at most. The main constraints are energy density of the batteries and the engine power required for flight. In this work, we present a nano-sized blimp platform, consisting of a helium balloon and a rotorcraft. Thanks to the lift provided by helium, the blimp requires relatively little energy to remain at a stable altitude. We also introduce the concept of duty-cycling high power actuators, to reduce the energy requirements for hovering even further. With the addition of a solar panel, it is even feasible to sustain tens or hundreds of flight hours in modest lighting conditions (including indoor usage). A functioning 52 gram prototype was thoroughly characterized and its lifetime was measured in different harvesting conditions. Both our system model and the experimental results indicate our proposed platform requires less than 200 mW to hover in a self sustainable fashion. This represents, to the best of our knowledge, the first nano-size UAV for long term hovering with low power requirements.

ieee international workshop on advances in sensors and interfaces | 2017

Target following on nano-scale Unmanned Aerial Vehicles

Daniele Palossi; Jaskirat Singh; Michele Magno; Luca Benini

Unmanned Aerial Vehicles (UAVs) with high level autonomous navigation capabilities are a hot topic both in industry and academia due to their numerous applications. However, autonomous navigation algorithms are demanding from the computational standpoint, and it is very challenging to run them on-board of nano-scale UAVs (i.e., few centimeters of diameter) because of the limited capabilities of their MCU-based controllers. This work focuses on the object tracking capability, (i.e., target following capability) on such nano-UAVs. We present a lightweight hardware-software solution, bringing autonomous navigation on a commercial platform using only on-board computational resources. Furthermore, we evaluate a parallel ultra-low-power (PULP) platform that enables the execution of even more sophisticated algorithms. Experimental results demonstrate the benefits of our solution, achieving accurate target following using an ARM Cortex M4 microcontroller consuming ≈ 130mW. Our evaluation on a PULP architecture shows the proposed solution running up-to 60 frame-per second in a power envelope of ≈ 30mW leaving more than 70% of the computational resources free for further on-board processing of more complex algorithms.

software and compilers for embedded systems | 2016

Exploring Single Source Shortest Path Parallelization on Shared Memory Accelerators

Daniele Palossi; Andrea Marongiu

Single Source Shortest Path (SSSP) algorithms are widely used in embedded systems for several applications. The emerging trend towards the adoption of heterogeneous designs in embedded devices, where low-power parallel accelerators are coupled to the main processor, opens new opportunities to deliver superior performance/watt, but calls for efficient parallel SSSP implementation. In this work we provide a detailed exploration of the Δ-stepping algorithm performance on a representative heterogeneous embedded system, TI Keystone II, considering the impact of several parallelization parameters (threading, load balancing, synchronization).

software and compilers for embedded systems | 2017

On the Accuracy of Near-Optimal GPU-Based Path Planning for UAVs

Daniele Palossi; Andrea Marongiu; Luca Benini

Path planning is one of the key functional blocks for any autonomous aerial vehicle (UAV). The goal of a path planner module is to constantly update the route of the vehicle based on information sensed in real-time. Given the high computational requirements of this task, heterogeneous many-cores are appealing candidates for its execution. Approximate path computation has proven a promising approach to reduce total execution time, at the cost of a slight loss in accuracy. In this work we study performance and accuracy of state-of-the-art, near-optimal parallel path planning in combination with program transformations aimed at ensuring efficient use of embedded GPU resources. We propose a profile-based algorithmic variant which boosts GPU execution by up to ≈ 7x, while maintaining the accuracy loss below 5%.

international conference on conceptual structures | 2017

GPU-Accelerated Real-Time Path Planning and the Predictable Execution Model

Björn Forsberg; Daniele Palossi; Andrea Marongiu; Luca Benini

Abstract Path planning is one of the key functional blocks for autonomous vehicles constantly updating their route in real-time. Heterogeneous many-cores are appealing candidates for its execution, but the high degree of resource sharing results in very unpredictable timing behavior. The predictable execution model (PREM) has the potential to enable the deployment of real-time applications on top of commercial off-the-shelf (COTS) heterogeneous systems by separating compute and memory operations, and scheduling the latter in an interference-free manner. This paper studies PREM applied to a state-of-the-art path planner running on a NVIDIA Tegra X1, providing insight on memory sharing and its impact on performance and predictability. The results show that PREM reduces the execution time variance to near-zero, providing a 3× decrease in the worst case execution time.

IEEE Transactions on Human-Machine Systems | 2017