Tyler Sorensen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tyler Sorensen is active.

Explore More

Publication

Featured researches published by Tyler Sorensen.

architectural support for programming languages and operating systems | 2015

GPU Concurrency: Weak Behaviours and Programming Assumptions

Jade Alglave; Mark Batty; Alastair F. Donaldson; Ganesh Gopalakrishnan; Jeroen Ketema; Daniel Poetzl; Tyler Sorensen; John Wickerson

Concurrency is pervasive and perplexing, particularly on graphics processing units (GPUs). Current specifications of languages and hardware are inconclusive; thus programmers often rely on folklore assumptions when writing software. To remedy this state of affairs, we conducted a large empirical study of the concurrent behaviour of deployed GPUs. Armed with litmus tests (i.e. short concurrent programs), we questioned the assumptions in programming guides and vendor documentation about the guarantees provided by hardware. We developed a tool to generate thousands of litmus tests and run them under stressful workloads. We observed a litany of previously elusive weak behaviours, and exposed folklore beliefs about GPU programming---often supported by official tutorials---as false. As a way forward, we propose a model of Nvidia GPU hardware, which correctly models every behaviour witnessed in our experiments. The model is a variant of SPARC Relaxed Memory Order (RMO), structured following the GPU concurrency hierarchy.

Journal of Cataract and Refractive Surgery | 2012

Ultrasound-induced corneal incision contracture survey in the United States and Canada.

Tyler Sorensen; Clara C. Chan; Michael J. Bradley; Rosa Braga-Mele; Randall J. Olson

PURPOSE: To ascertain factors associated with corneal incision contracture (wound burn) secondary to phacoemulsification in the United States and Canada. SETTING: John A. Moran Eye Center, University of Utah, Salt Lake City, Utah, USA, and University of Toronto, Toronto, Ontario, Canada. DESIGN: Cross‐sectional study. METHODS: Through state and provincial societies, members were queried as to cataract surgery practices during the previous 3 years as well as the specifics associated with each case of wound burn, if any, encountered during that period. RESULTS: Eight hundred forty‐two cataract surgeons reported on 920 095 surgeries and 341 wound burns (raw incidence 0.037%). After a multivariate analysis, the wound burn incidence was significantly inversely associated with the surgeon’s surgical volume (45% decrease per doubling of volume; 95% confidence interval, 38%‐55%; P<.001), the surgical approach (P<.001), and the ophthalmic viscosurgical device (OVD) used (P=.004). Machine or ultrasound modality used, region of the U.S. or Canada, and incision size were not related to wound burn. CONCLUSION: Phacoemulsification‐induced wound burn can be reduced by experience, by the approach used in nucleus disassembly, by choice of OVD, and most important, by not using ultrasound when the anterior chamber is filled with OVD. Financial Disclosure: Dr. Braga‐Mele is a consultant to Abbott Medical Optics, Inc., Alcon Laboratories, Inc., and Bausch & Lomb. Dr. Olson has been a consultant to Abbott Medical Optics, Inc., Becton, Dickinson and Co., and Allergan, Inc. He has received grant support from Abbott Medical Optics, Inc. and Allergan, Inc. No other author has a financial or proprietary interest in any material or method mentioned.

programming language design and implementation | 2016

Exposing errors related to weak memory in GPU applications

Tyler Sorensen; Alastair F. Donaldson

We present the systematic design of a testing environment that uses stressing and fuzzing to reveal errors in GPU applications that arise due to weak memory effects. We evaluate our approach on seven GPUs spanning three Nvidia architectures, across ten CUDA applications that use fine-grained concurrency. Our results show that applications that rarely or never exhibit errors related to weak memory when executed natively can readily exhibit these errors when executed in our testing environment. Our testing environment also provides a means to help identify the root causes of such errors, and automatically suggests how to insert fences that harden an application against weak memory bugs. To understand the cost of GPU fences, we benchmark applications with fences provided by the hardening strategy as well as a more conservative, sound fencing strategy.

international conference on supercomputing | 2013

Towards shared memory consistency models for GPUs

Tyler Sorensen; Ganesh Gopalakrishnan; Vinod Grover

With the widespread use of graphical processing units (GPUs), it is important to ensure that programmers have a clear understanding of their shared memory consistency model, i.e. what values can be read when issued concurrently with writes. Compared to CPUs, GPUs present different shared memory behavior, and we know of no published formal consistency model for them. To fill this void, we establish a formal state transition model of GPU loads, stores, and fences in the language Murphi, and check properties -- captured in litmus tests that pertain to ordering and visibility properties -- over executions using the Murphi model checker.

conference on object oriented programming systems languages and applications | 2016

Portable inter-workgroup barrier synchronisation for GPUs

Tyler Sorensen; Alastair F. Donaldson; Mark Batty; Ganesh Gopalakrishnan; Zvonimir Rakamarić

Despite the growing popularity of GPGPU programming, there is not yet a portable and formally-specified barrier that one can use to synchronise across workgroups. Moreover, the occupancy-bound execution model of GPUs breaks assumptions inherent in traditional software execution barriers, exposing them to deadlock. We present an occupancy discovery protocol that dynamically discovers a safe estimate of the occupancy for a given GPU and kernel, allowing for a starvation-free (and hence, deadlock-free) inter-workgroup barrier by restricting the number of workgroups according to this estimate. We implement this idea by adapting an existing, previously non-portable, GPU inter-workgroup barrier to use OpenCL 2.0 atomic operations, and prove that the barrier meets its natural specification in terms of synchronisation. We assess the portability of our approach over eight GPUs spanning four vendors, comparing the performance of our method against alternative methods. Our key findings include: (1) the recall of our discovery protocol is nearly 100%; (2) runtime comparisons vary substantially across GPUs and applications; and (3) our method provides portable and safe inter-workgroup synchronisation across the applications we study.

Canadian Journal of Ophthalmology-journal Canadien D Ophtalmologie | 2012

A comparison of cataract surgical practices in Canada and the United States

Tyler Sorensen; Clara C. Chan; Michael Bradley; Rosa Braga-Mele; Randall J. Olson

OBJECTIVE To determine cataract surgical practices in Canada and the United States. DESIGN Cross-sectional study. PARTICIPANTS 1250 clinician members of the Canadian Ophthalmological Society and U.S. state societies. METHODS This survey updated and expanded upon results of a 5-state survey published in 2006. Practices for the preceding 3 years were determined for the phacoemulsification machine, the ultrasound modality, the surgical approach, the viscoelastic, and the wound size used during cataract surgery. RESULTS The participating surgeons responded concerning 963,543 surgeries. Canada had the busiest surgeons, who were more likely than their U.S. counterparts to use vertical chop and DisCoVisc. Surgeons above the median in surgical volume were more likely to use ultrapulse ultrasound, less likely to use a divide-and-conquer approach, and more likely to use a vertical chopping approach than were those below the median. The northeastern United States had the least busy surgeons. OZil is the most common ultrasound modality used today. CONCLUSIONS This expanded survey revealed that practice patterns vary quite widely. Furthermore, the preponderance of OZil ultrasound since the 2006 survey shows that changes in the field can happen very rapidly.

international workshop on opencl | 2016

The Hitchhiker's Guide to Cross-Platform OpenCL Application Development

Tyler Sorensen; Alastair F. Donaldson

One of the benefits to programming of OpenCL is platform portability. That is, an OpenCL program that follows the OpenCL specification should, in principle, execute reliably on any platform that supports OpenCL. To assess the current state of OpenCL portability, we provide an experience report examining two sets of open source benchmarks that we attempted to execute across a variety of GPU platforms, via OpenCL. We report on the portability issues we encountered, where applications would execute successfully on one platform but fail on another. We classify issues into three groups: (1) framework bugs, where the vendor-provided OpenCL framework fails; (2) specification limitations, where the OpenCL specification is unclear and where different GPU platforms exhibit different behaviours; and (3) programming bugs, where non-portability arises due to the program exercising behaviours that are incorrect or undefined according to the OpenCL specification. The issues we encountered slowed the development process associated with our sets of applications, but we view the issues as providing exciting motivation for future testing and verification efforts to improve the state of OpenCL portability; we conclude with a discussion of these.

programming language design and implementation | 2018

The semantics of transactions and weak memory in x86, Power, ARM, and C++

Nathan Chong; Tyler Sorensen; John Wickerson

Weak memory models provide a complex, system-centric semantics for concurrent programs, while transactional memory (TM) provides a simpler, programmer-centric semantics. Both have been studied in detail, but their combined semantics is not well understood. This is problematic because such widely-used architectures and languages as x86, Power, and C++ all support TM, and all have weak memory models. Our work aims to clarify the interplay between weak memory and TM by extending existing axiomatic weak memory models (x86, Power, ARMv8, and C++) with new rules for TM. Our formal models are backed by automated tooling that enables (1) the synthesis of tests for validating our models against existing implementations and (2) the model-checking of TM-related transformations, such as lock elision and compiling C++ transactions to hardware. A key finding is that a proposed TM extension to ARMv8 currently being considered within ARM Research is incompatible with lock elision without sacrificing portability or performance.

foundations of software engineering | 2017

Cooperative kernels: GPU multitasking for blocking algorithms

Tyler Sorensen; Hugues Evrard; Alastair F. Donaldson

There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking, so they require fair scheduling. But GPU programming models (e.g. OpenCL) do not mandate fair scheduling, and GPU schedulers are unfair in practice. Current approaches avoid this issue by exploiting scheduling quirks of todays GPUs in a manner that does not allow the GPU to be shared with other workloads (such as graphics rendering tasks). We propose cooperative kernels, an extension to the traditional GPU programming model geared towards writing blocking algorithms. Workgroups of a cooperative kernel are fairly scheduled, and multitasking is supported via a small set of language extensions through which the kernel and scheduler cooperate. We describe a prototype implementation of a cooperative kernel framework implemented in OpenCL 2.0 and evaluate our approach by porting a set of blocking GPU applications to cooperative kernels and examining their performance under multitasking. Our prototype exploits no vendor-specific hardware, driver or compiler support, thus our results provide a lower-bound on the efficiency with which cooperative kernels can be implemented in practice.

symposium on principles of programming languages | 2017