Joon Suan Ong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joon Suan Ong is active.

Explore More

Publication

Featured researches published by Joon Suan Ong.

Communications of The ACM | 2001

Structuring operating system aspects: using AOP to improve OS structure modularity

Yvonne Coady; Gregor Kiczales; Mike Feeley; Norm Hutchinson; Joon Suan Ong

Key elements of operating systems crosscut – their implementation is inherently coupled with several layers of the system. Prefetching, for example, is a critical architectural performance optimization that amortizes the cost of going to disk by predicting and retrieving additional data with each explicit disk request. The implementation of prefetching, however, is tightly coupled with both high-level context of the request source and low-level costs of additional retrieval. In a traditional OS implementation, small clusters of customized prefetching code appear at both high and low levels along most execution paths that involve going to disk. This makes prefetching difficult to reason about and change, and interferes with the clarity of the primary functionality within which prefetching is embedded.

high performance distributed computing | 1999

Using embedded network processors to implement global memory management in a workstation cluster

Yvonne Coady; Joon Suan Ong; Michael J. Feeley

Advances in network technology continue to improve the communication performance of workstation and PC clusters, making high-performance workstation-cluster computing increasingly viable. These hardware advances, however, are taxing traditional host-software network protocols to breaking point. A modern gigabit network can swamp a hosts IO bus and processor, limiting communication performance and slowing computation unacceptably. Fortunately, host-programmable network processors used by these networks present a potential solution. Offloading selected host processing to these embedded network processors lowers host overhead and improves latency. This paper examines the use of embedded network processors to improve the performance of workstation-cluster global memory management. We have implemented a revised version of the GMS global memory system that eliminates host overhead by as much as 29% on active nodes and improves page fault latency by as much as 39%.

acm sigops european workshop | 2002

Brittle systems will break - not bend: can aspect-oriented programming help?

Yvonne Coady; Gregor Kiczales; Joon Suan Ong; Andrew Warfield; Michael J. Feeley

As OS code moves to new settings, it must be continually reshaped. Kernel code however, is notoriously brittle -- a small, seemingly localized change can break disparate parts of the system simultaneously. The problem is that the implementation of some system concerns are not modular because they naturally crosscut the system structure.Aspect-oriented programming proposes new mechanisms to enable the modular implementation of cross-cutting concerns. This paper evaluates aspect-oriented programming in the context of two crosscutting concerns in a FreeBSD 4.4 kernel -- page daemon activation and disk quotas. The ways in which aspects allowed us to make these implementations modular, the impact they have on comprehensibility and configurability, and the costs associated with supporting a prototype of an aspect-oriented runtime environment are presented.

high performance distributed computing | 2000

Using idle workstations to implement predictive prefetching

Jasmine Y. Q. Wang; Joon Suan Ong; Yvonne Coady; Michael J. Feeley

The benefits of Markov-based predictive prefetching have been largely overshadowed by the overhead required to produce high-quality predictions. While both theoretical and simulation results for prediction algorithms appear promising, substantial limitations exist in practice. This outcome can be partially attributed to the fact that practical implementations ultimately make compromises in order to reduce overhead. These compromises limit the level of algorithm complexity, the variety of access patterns and the granularity of trace data that the implementation supports. This paper describes the design and implementation of GMS-3P (Global Memory System with Parallel Predictive Prefetching), an operating system kernel extension that offloads prediction overhead to idle network nodes. GMS-3P builds on the GMS global memory system, which pages to and from remote workstation memory. In GMS-3P, the target node sends an online trace of an applications page faults to an idle node that is running a Markov-based prediction algorithm. The prediction node then uses GMS to prefetch pages to the target node from the memory of other workstations in the network. Our preliminary results show that predictive prefetching can reduce the remote-memory page fault time by 60% or more and that, by offloading prediction overhead to an idle node, GMS-3P can reduce this improved latency by between 24% and 44%, depending on the Markov model order.

Archive | 2003

Network virtual memory

Michael J. Feeley; Joon Suan Ong

This thesis describes the design and implementation of NetVM, which is a network interface that supports user-mode access, zero-copy transfer and sender-managed communication without pinning source or destination memory. To do this, the network interface maintains a shadow page table, which the host operating system updates whenever it maps or unmaps a page in host memory. The network interface uses this table to briefly lock and translate the virtual address of a page when it accesses that page for DMA transfer. The operating system is prevented from replacing a page in the short interval that the network interface has locked that page. If a destination page is not resident in memory, the network interface redirects the data to an intermediate system buffer, which the operating system uses to complete the transfer with a single host-to-host memory copy after fetching in the required page. A credit-based flow-control scheme prevents the system buffer from overflowing. Application-level DMA transfers only data. To support control transfers, NetVM implements a counter-based notification mechanism for applications to issue and detect notifications. The sending application increments an event counter by specifying its identifier in an RDMA write operation. The receiving application detects this event by busy waiting, block waiting or triggering a user-defined handler whenever the notifying write completes. This range of detection mechanisms allows the application to decide appropriate tradeoffs between reducing signaling latency and reducing processor overhead. NetVM enforces ordered notifications over an out-of-order delivery network by using a sequence window. The NetVM prototype is implemented in firmware for the Myrinet LANai-9.2 and integrated with the FreeBSD 4.6 virtual memory system. NetVMs memory-management overhead is low; it adds only less than 5.0% write latency compared to a static pinning approach and has a lower pinning cost compared to a dynamic pinning approach that has up to 94.5% hit rate in the pinned-page cache. Minimum write latency is 5.56μs and maximum throughput is 155.46MB/s, which is 97.2% of the link bandwidth. Transferring control through notification adds between 2.96μs and 17.49μs to the write operation, depending on the detection mechanism used. Compared to standard low-level atomic operations, NetVM adds only up to 18.2% and 12.6% to application latencies for high-level wait-queue and counting-semaphore operations respectively. (Abstract shortened by UMI.)

Communications of The ACM | 2001