Is this you? Create Your Porfile

Matias Koskela

Tampere University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matias Koskela is active.

Explore More

Publication

Featured researches published by Matias Koskela.

international symposium on visual computing | 2016

Foveated Path Tracing

Matias Koskela; Timo Viitanen; Pekka Jääskeläinen; Jarmo Takala

Virtual Reality (VR) places demanding requirements on the rendering pipeline: the rendering is stereoscopic and the refresh rate should be as high as 95 Hz to make VR immersive. One promising technique for making the final push to meet these requirements is foveated rendering, where the rendering effort is prioritized on the areas where the user’s gaze lies. This requires rapid adjustment of level of detail based on screen space coordinates. Path tracing allows this kind of changes without much extra work. However, real-time path tracing is fairly new concept. This paper is a literature review of techniques related to optimizing path tracing with foveated rendering. In addition, we provide a theoretical estimation of performance gains available and calculate that 94% of the paths could be omitted. For this reason we predict that path tracing can soon meet the demanding rendering requirements of VR.

international conference on computer graphics and interactive techniques | 2017

Foveated instant preview for progressive rendering

Matias Koskela; Kalle Immonen; Timo Viitanen; Pekka Jääskeläinen; Joonas Multanen; Jarmo Takala

Progressive rendering, for example Monte Carlo rendering of 360° content for virtual reality headsets, is a time-consuming task. If the 3D artist notices an error while previewing the rendering, he or she must return to editing mode, do the required changes, and restart rendering. Restart is required because the rendering system cannot know which pixels are affected by the change. We propose the use of eye-tracking-based optimization to significantly speed up previewing the artists points of interest. Moreover, we derive an optimized version of the visual acuity model, which follows the original model more accurately than previous work. The proposed optimization was tested with a comprehensive user study. The participants felt that preview with the proposed method converged instantly, and the recorded split times show that the preview is 10 times faster than conventional preview. In addition, the system does not have measurable drawbacks on computational performance.

international conference on computer graphics and interactive techniques | 2016

Multi bounding volume hierarchies for ray tracing pipelines

Timo Viitanen; Matias Koskela; Pekka Jääskeläinen; Jarmo Takala

High-performance ray tracing on CPU is now largely based on Multi Bounding Volume Hierarchy (MBVH) trees. We apply MBVH to a fixed-function ray tracing accelerator architecture. According to cycle-level simulations and power analysis, MBVH reduces energy per frame by an average of 24% and improves performance per area by 19% in scenes with incoherent rays, due to its compact memory layout which reduces DRAM traffic. With primary rays, energy efficiency improves by 15% and performance per area by 20%.

international conference on computer graphics and interactive techniques | 2015

MergeTree: a HLBVH constructor for mobile systems

Timo Viitanen; Matias Koskela; Pekka Jääskeläinen; Heikki Kultala; Jarmo Takala

Powerful hardware accelerators have been recently developed that put interactive ray-tracing even in the reach of mobile devices. However, supplying the rendering unit with up-to date acceleration trees remains difficult, so the rendered scenes are mostly static. The restricted memory bandwidth of a mobile device is a challenge with applying GPU-based tree construction algorithms. This paper describes MergeTree, a BVH tree constructor architecture based on the HLBVH algorithm, whose main features of interest are a streaming hierarchy emitter, an external sorting algorithm with provably minimal memory usage, and a hardware priority queue used to accelerate the external sort. In simulations, the resulting unit is faster by a factor of three than the state-of-the art hardware builder based on the binned SAH sweep algorithm.

international joint conference on computer vision imaging and computer graphics theory and applications | 2018

Sparse Sampling for Real-time Ray Tracing.

Timo Viitanen; Matias Koskela; Kalle Immonen; Markku Mäkitalo; Pekka Jääskeläinen; Jarmo Takala

Ray tracing is an interesting rendering technique, but remains too slow for real-time applications. There are various algorithmic methods to speed up ray tracing through uneven screen-space sampling, e.g., foveated rendering where sampling is directed by eye tracking. Uneven sampling methods tend to require at least one sample per pixel, limiting their use in real-time rendering. We review recent work on image reconstruction from arbitrarily distributed samples, and argue that these will play major role in the future of real-time ray tracing, allowing a larger fraction of samples to be focused on regions of interest. Potential implementation approaches and challenges are discussed.

international conference on computer graphics and interactive techniques | 2017

MergeTree: A Fast Hardware HLBVH Constructor for Animated Ray Tracing

Timo Viitanen; Matias Koskela; Pekka Jääskeläinen; Heikki Kultala; Jarmo Takala

Ray tracing is a computationally intensive rendering technique traditionally used in offline high-quality rendering. Powerful hardware accelerators have been recently developed that put real-time ray tracing even in the reach of mobile devices. However, rendering animated scenes remains difficult, as updating the acceleration trees for each frame is a memory-intensive process. This article proposes MergeTree, the first hardware architecture for Hierarchical Linear Bounding Volume Hierarchy (HLBVH) construction, designed to minimize memory traffic. For evaluation, the hardware constructor is synthesized on a 28nm process technology. Compared to a state-of-the-art binned surface area heuristic sweep (SAH) builder, the present work speeds up construction by a factor of 5, reduces build energy by a factor of 3.2, and memory traffic by a factor of 3. A software HLBVH builder on a graphics processing unit (GPU) requires 3.3 times more memory traffic. To take tree quality into account, a rendering accelerator is modeled alongside the builder. Given the use of a toplevel build to improve tree quality, the proposed builder reduces system energy per frame by an average 41% with primary rays and 13% with diffuse rays. In large ( > 500K triangles) scenes, the difference is more pronounced, 62% and 35%, respectively.

international symposium on system on chip | 2016

OpenCL programmable exposed datapath high performance low-power image signal processor

Joonas Multanen; Heikki Kultala; Matias Koskela; Timo Viitanen; Pekka Jääskeläinen; Jarmo Takala; Aram Danielyan; Cristovao Cruz

Sophisticated computational imaging algorithms require both high performance and good energy-efficiency when executed on mobile devices. Recent trend has been to exploit the abundant data-level parallelism found in general purpose programmable GPUs. However, for low-power mobile use cases, generic GPUs consume excessive amounts of power. This paper proposes a programmable computational imaging processor with 16-bit half-precision SIMD floating point vector processing capabilities combined with power efficiency of an exposed datapath. In comparison to traditional VLIW architectures with similar computational resources, the exposed datapath reduces the register file traffic and complexity. These and the specific optimizations enabled by the explicit programming model enable extremely good power-performance. When synthesized on a 28nm ASIC technology, the accelerator consumes 71mW of power while running a state-of-the-art denoising algorithm, and occupies only 0.2mm2 of chip area. For the algorithm, energy usage per frame is 7mJ, which is 10x less than the best found GPU-based implementation.

ieee global conference on signal and information processing | 2015

Rapid customization of image processors using Halide

Ville Korhonen; Pekka Jääskeläinen; Matias Koskela; Timo Viitanen; Jarmo Takala

Image processing applications typically involve data-oriented kernels with limited control divergence. In order to efficiently exploit the data level parallelism, image processors include SIMD instructions and other parallel computation resources. Generic processors that can be purchased off-the-shelf are adequate for most of the use scenarios of image processing. However, especially with embedded mobile devices, they might not be optimal for the algorithm, the environment, or the energy budget at hand. Such cases call for programmable customized architectures with just enough hardware resources to ensure the high priority applications reach their real time goals with minimal overheads. In order to maintain high engineer productivity, implementing image algorithms for customized processors should be as easy as with standard processors. This is emphasized at the processor co-design time; because the program is used to drive the processor design space exploration towards an optimized architecture, assembly programming is not feasible due to the required porting effort whenever the architecture is modified. In this paper we propose an image processor customization flow that exploits the domain-specific Halide language as an input to a processor co-design environment. In addition to efficiently exploiting standard resources in the customized processors, the flow provides an easy way to invoke special instructions from Halide programs. We validate the performance benefits of custom operations using example filters described with the Halide language.

signal processing systems | 2018

Software Defined Radio Implementation of a Digital Self-interference Cancellation Method for Inband Full-Duplex Radio Using Mobile Processors

Mona Aghababaeetafreshi; Dani Korpi; Matias Koskela; Pekka Jääskeläinen; Mikko Valkama; Jarmo Takala

New means to improve spectral efficiency and flexibility in radio spectrum use are in high demand due to congestion of the available spectral resources. Systems deploying inband full-duplex transmission aim at providing higher spectral efficiency by concurrent transmission and reception at the same frequency. Potentially doubling system throughput, full-duplex communications is considered as an enabler technology for the upcoming 5G networks. However, system performance is degraded due to the strong self-interference (SI) caused by overlapping of high power transmit signal with the received signal of interest. Furthermore, due to commonly existing radio frequency imperfections, advanced techniques capable of mitigating nonlinear SI are required. This article presents a real-time software-defined implementation of a digital SI canceller for full-duplex transceivers, potentially applicable even in mobile-scale devices. Recently, software-defined radio has gained a lot of interest due to its higher flexibility, scalability, and shorter time-to-market cycles compared to traditional fixed-function hardware designs. Moreover, as the performance enhancements achieved by increasing the clock frequency is reaching its limits, the current trend is towards multi-core processors. Since contemporary mobile phones already contain powerful massively parallel GPUs and CPUs, feasibility of a real-time implementation on mobile processors is studied. The reported results show that by adopting the presented solution, it is possible to achieve sufficient SI cancellation under time varying coupling channel conditions. Additionally, the possibility of carrying out such advanced processing in a real-time fashion on the selected platforms is investigated, and the implementation is evaluated in terms of execution time, power, and energy consumption.

Proceedings of the ACM on Computer Graphics and Interactive Techniques | 2018

PLOCTree: A Fast, High-Quality Hardware BVH Builder

Timo Viitanen; Matias Koskela; Pekka Jääskeläinen; Aleksi Tervo; Jarmo Takala

In the near future, GPUs are expected to have hardware support for real-time ray tracing in order to, e.g., help render complex lighting effects in video games and enable photorealistic augmented reality. One challenge in real-time ray tracing is dynamic scene support, that is, rebuilding or updating the spatial data structures used to accelerate rendering whenever the scene geometry changes. This paper proposes PLOCTree, an accelerator for tree construction based on the Parallel Locally-Ordered Clustering (PLOC) algorithm. Tree construction is highly memory-intensive, thus for the hardware implementation, the algorithm is rewritten into a bandwidth-economical form which converts most of the external memory traffic of the original software-based GPU implementation into streaming on-chip data traffic. As a result, the proposed unit is 3.9 times faster and uses 7.7 times less memory bandwidth than the GPU implementation. Compared to state-of-the-art hardware builders, PLOCTree gives a superior performance-quality tradeoff: it is nearly as fast as a state-of-the-art low-quality linear builder, while producing trees of similar Surface Area Heuristic (SAH) cost as a comparatively expensive binned SAH sweep builder.

Explore More