Håvard Espeland
University of Oslo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Håvard Espeland.
international symposium on multimedia | 2011
Dag Haavi Finstad; Håkon Kvale Stensland; Håvard Espeland; Pål Halvorsen
Adaptive HTTP streaming is frequently used for both live and on-Demand video delivery over the Internet. Adaptive ness is often achieved by encoding the video stream in multiple qualities (and thus bit rates), and then transparently switching between the qualities according to the bandwidth fluctuations and the amount of resources available for decoding the video content on the end device. For this kind of video delivery over the Internet, H.264 is currently the most used codec, but VP8 is an emerging open-source codec expected to compete with H.264 in the streaming scenario. The challenge is that, when encoding video for adaptive video streaming, both VP8 and H.264 run once for each quality layer, i.e., consuming both time and resources, especially important in a live video delivery scenario. In this paper, we address the resource consumption issues by proposing a method for reusing redundant steps in a video encoder, emitting multiple outputs with varying bit rates and qualities. It shares and reuses the computational heavy analysis step, notably macro-block mode decision, intra prediction and inter prediction between the instances, and outputs video in several rates. The method has been implemented in the VP8 reference encoder, and experimental results show that we can encode the different quality layers at the same rates and qualities compared to the VP8 reference encoder, while reducing the encoding time significantly.
international conference on parallel processing | 2011
Håvard Espeland; Paul B. Beskow; Håkon Kvale Stensland; Preben N. Olsen; Ståle Kristoffersen; Carsten Griwodz; Pål Halvorsen
The computational demands of multimedia data processing are steadily increasing as consumers call for progressively more complex and intelligent multimedia services. New multi-core hardware architectures provide the required resources, but writing parallel, distributed applications remains a labor-intensive task compared to their sequential counter-part. For this reason, Google and Microsoft implemented their respective processing frameworks MapReduce and Dryad, as they allow the developer to think sequentially, yet benefit from parallel and distributed execution. An inherent limitation in the design of these batch processing frameworks is their inability to express arbitrarily complex workloads. The dependency graphs of the frameworks are often limited to directed acyclic graphs, or even pre-determined stages. This is particularly problematic for video encoding and other algorithms that depend on iterative execution. With the Nornir runtime system for parallel programs, which is a Kahn Process Network implementation, we addressed and solved several of these limitations. However, it is more difficult to use than other frameworks due to its complex programming model. In this paper, we build on the knowledge gained from Nornir and present a new framework, called P2G, designed specifically for developing and processing distributed real-time multimedia data. P2G supports arbitrarily complex dependency graphs with cycles, branches and deadlines, and provides both data- and task-parallelism. The framework is implemented to scale transparently with available (heterogeneous) resources, a concept familiar from the cloud computing paradigm. We have implemented an (interchangeable) P2G kernel language to ease development. In this paper, we present a proof of concept implementation of a P2G execution node and some experimental examples using complex workloads like Motion JPEG and K-means clustering. The results show that theP2G system is a feasible approach to multimedia processing.
international conference on parallel processing | 2012
Kjetil Raaen; Håvard Espeland; Håkon Kvale Stensland; Andreas Petlund; Pål Halvorsen; Carsten Griwodz
Supporting thousands of interacting players in a virtual world poses huge challenges with respect to processing. Existing work that addresses the challenge utilizes a variety of spatial partitioning algorithms to distribute the load. If, however, a large number of players needs to interact tightly across an area of the game world, spatial partitioning cannot subdivide this area without incurring massive communication costs, latency or inconsistency. It is a major challenge of game engines to scale such areas to the largest number of players possible, in a deviation from earlier thinking, parallelism on multi-core architectures is applied to increase scalability. In this paper, we evaluate the design and implementation of our game server architecture, called LEARS, which allows for lock-free parallel processing of a single spatial partition by considering every game cycle an atomic tick. Our prototype is evaluated using traces from live game sessions where we measure the server response time for all objects that need timely updates. We also measure how the response time for the multi-threaded implementation varies with the number of threads used. Our results show that the challenge of scaling up a game-server can be an embarrassingly parallel problem.
The Journal of Supercomputing | 2013
Zeljko Vrba; Pål Halvorsen; Carsten Griwodz; Paul B. Beskow; Håvard Espeland; Dag Johansen
Even though shared-memory concurrency is a paradigm frequently used for developing parallel applications on small- and middle-sized machines, experience has shown that it is hard to use. This is largely caused by synchronization primitives which are low-level, inherently non-deterministic, and, consequently, non-intuitive to use. In this paper, we present the Nornir run-time system. Nornir is comparable to well-known frameworks such as MapReduce and Dryad that are recognized for their efficiency and simplicity. Unlike these frameworks, Nornir also supports process structures containing branches and cycles. Nornir is based on the formalism of Kahn process networks, which is a shared-nothing, message-passing model of concurrency. We deem this model a simple and deterministic alternative to shared-memory concurrency. Experiments with real and synthetic benchmarks on up to 8 CPUs show that performance in most cases scales almost linearly with the number of CPUs, when not limited by data dependencies. We also show that the modeling flexibility allows Nornir to outperform its MapReduce counterparts using well-known benchmarks.
network and operating system support for digital audio and video | 2010
Håkon Kvale Stensland; Håvard Espeland; Carsten Griwodz; Pål Halvorsen
When used efficiently, modern multicore architectures, such as Cell and GPUs, provide the processing power required by resource demanding multimedia workloads. However, the diversity of resources exposed to the programmers, intrinsically requires specific mindsets for efficiently utilizing these resources - not only compared to an x86 architecture, but also between the Cell and the GPUs. In this context, our analysis of 14 different Motion-JPEG implementations indicates that there exists a large potential for optimizing performance, but there are also many pitfalls to avoid. By experimentally evaluating algorithmic choices, inter-core data communication (memory transfers) and architecture-specific capabilities, such as instruction sets, we present tips, tricks and troubles with respect to efficient utilization of the available resources.
network and system support for games | 2011
Kjetil Raaen; Håvard Espeland; Håkon Kvale Stensland; Andreas Petlund; Pål Halvorsen; Carsten Griwodz
Games where thousands of players can interact concurrently pose many challenges with regards to the massive parallelism. Earlier work within the field suggests that this is difficult due to synchronization issues. In this paper, we present an implementation of a game server architecture based on a model that allows for massive parallelism. The system is evaluated using traces from live game sessions that has been scaled up to generate massive workloads. We measure the differences in server response time for all objects that need timely updates. We also measure how the response time for the multithreaded implementation varies with the number of threads used. Our results show that the case of implementing a game-server can actually be highly parallel problem.
international performance computing and communications conference | 2009
Carl Henrik Lunde; Håvard Espeland; Håkon Kvale Stensland; Pål Halvorsen
Current in-kernel disk schedulers provide efficient means to optimize the order (and minimize disk seeks) of issued, in-queue I/O requests. However, they fail to optimize sequential multi-file operations, like traversing a large file tree, because only requests from one file are available in the scheduling queue at a time. We have therefore investigated a user-level, I/O request sorting approach to reduce inter-file disk arm movements. This is achieved by allowing applications to utilize the placement of inodes and disk blocks to make a one sweep schedule for all file I/Os requested by a process, i.e., data placement information is read first before issuing the low-level I/O requests to the storage system. Our experiments with a modified version of tar show reduced disk arm movements and large performance improvements.
acm multimedia | 2011
Paul B. Beskow; Håkon Kvale Stensland; Håvard Espeland; Espen A. Kristiansen; Preben N. Olsen; Ståle Kristoffersen; Carsten Griwodz; Pål Halvorsen
In this demo, we present the P2G framework designed for processing distributed real-time multimedia data. P2G supports arbitrarily complex dependency graphs with cycles, branches and deadlines. P2G is implemented to scale transparently with available resources, i.e., a concept familiar from the cloud computing paradigm. Additionally, P2G supports heterogeneous computing resources, such as x86 and GPU processing cores. We have implemented an interchangeable P2G kernel language which is meant to expose fundamental concepts of the P2G programming model and ease the application development. Here, we demonstrate the P2G execution node using a MJPEG encoder as an example workload when dynamically adding and removing processing cores.
network and operating system support for digital audio and video | 2008
Håvard Espeland; Carl Henrik Lunde; Håkon Kvale Stensland; Carsten Griwodz; Pål Halvorsen
Today, major newspapers and TV stations make live and on-demand audio/video content available, video-on-demand services are becoming common and even personal media are frequently uploaded to streaming sites. The discussion about the best transport protocol for streaming has been going on for years. Currently, HTTP-streaming is usual although the transport of streaming media data over TCP is hindered by TCPs probing behavior, which results in the rapid reduction and slow recovery of the packet rates. On the other hand, UDP has been criticized for being unfair against TCP, and it is therefore often blocked by access network providers.
MediaSync, Handbook on Multimedia Synchronization | 2018
Vamsidhar Reddy Gaddam; Ragnar Langseth; Håkon Kvale Stensland; Carsten Griwodz; Michael Riegler; Tomas Kupka; Håvard Espeland; Dag Johansen; Håvard D. Johansen; Pål Halvorsen
Multi-camera systems are frequently used in applications such as panorama videos creation, free-viewpoint rendering, and 3D reconstruction. A critical aspect for visual quality in these systems is that the cameras are closely synchronized. In our research, we require high-definition panorama videos generated in real time using several cameras in parallel. This is an essential part of our sports analytics system called Bagadus, which has several synchronization requirements. The system is currently in use for soccer games at the Alfheim stadium for Tromso IL and at the Ullevaal stadium for the Norwegian national soccer team. Each Bagadus installation is capable of combining the video from five 2 K cameras into a single 50 fps cylindrical panorama video. Due to proper camera synchronization, the produced panoramas exhibit neither ghosting effects nor other visual inconsistencies at the seams. Our panorama videos are designed to support several members of the trainer team at the same time. Using our system, they are able to pan, tilt, and zoom interactively, independently over the entire field, from an overview shot to close-ups of individual players in arbitrary locations. To create such panoramas, each of our cameras covers one part of the field with small overlapping regions, where the individual frames are transformed and stitched together into a single view. We faced two main synchronization challenges in the panorama generation process. First, to stitch frames together without visual artifacts and inconsistencies due to motion, the shutters in the cameras had to be synchronized with sub-millisecond accuracy. Second, to circumvent the need for software readjustment of color and brightness around the seams between cameras, the exposure settings were synchronized. This chapter describes these synchronization mechanisms that were designed, implemented, evaluated, and integrated in the Bagadus system.