Theodore M. Wong
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Theodore M. Wong.
international symposium on neural networks | 2013
Andrew S. Cassidy; Paul A. Merolla; John V. Arthur; Steven K. Esser; Bryan L. Jackson; Rodrigo Alvarez-Icaza; Pallab Datta; Jun Sawada; Theodore M. Wong; Vitaly Feldman; Arnon Amir; Daniel Ben Dayan Rubin; Filipp Akopyan; Emmett McQuinn; William P. Risk; Dharmendra S. Modha
Marching along the DARPA SyNAPSE roadmap, IBM unveils a trilogy of innovations towards the TrueNorth cognitive computing system inspired by the brains function and efficiency. Judiciously balancing the dual objectives of functional capability and implementation/operational cost, we develop a simple, digital, reconfigurable, versatile spiking neuron model that supports one-to-one equivalence between hardware and simulation and is implementable using only 1272 ASIC gates. Starting with the classic leaky integrate-and-fire neuron, we add: (a) configurable and reproducible stochasticity to the input, the state, and the output; (b) four leak modes that bias the internal state dynamics; (c) deterministic and stochastic thresholds; and (d) six reset modes for rich finite-state behavior. The model supports a wide variety of computational functions and neural codes. We capture 50+ neuron behaviors in a library for hierarchical composition of complex computations and behaviors. Although designed with cognitive algorithms and applications in mind, serendipitously, the neuron model can qualitatively replicate the 20 biologically-relevant behaviors of a dynamical neuron model.
international symposium on neural networks | 2013
Steven K. Esser; Alexander Andreopoulos; Rathinakumar Appuswamy; Pallab Datta; Davis; Arnon Amir; John V. Arthur; Andrew S. Cassidy; Myron Flickner; Paul Merolla; Shyamal Chandra; Nicola Basilico; Stefano Carpin; Tom Zimmerman; Frank Zee; Rodrigo Alvarez-Icaza; Jeffrey A. Kusnitz; Theodore M. Wong; William P. Risk; Emmett McQuinn; Tapan Kumar Nayak; Raghavendra Singh; Dharmendra S. Modha
Marching along the DARPA SyNAPSE roadmap, IBM unveils a trilogy of innovations towards the TrueNorth cognitive computing system inspired by the brains function and efficiency. The non-von Neumann nature of the TrueNorth architecture necessitates a novel approach to efficient system design. To this end, we have developed a set of abstractions, algorithms, and applications that are natively efficient for TrueNorth. First, we developed repeatedly-used abstractions that span neural codes (such as binary, rate, population, and time-to-spike), long-range connectivity, and short-range connectivity. Second, we implemented ten algorithms that include convolution networks, spectral content estimators, liquid state machines, restricted Boltzmann machines, hidden Markov models, looming detection, temporal pattern matching, and various classifiers. Third, we demonstrate seven applications that include speaker recognition, music composer recognition, digit recognition, sequence prediction, collision avoidance, optical flow, and eye detection. Our results showcase the parallelism, versatility, rich connectivity, spatio-temporality, and multi-modality of the TrueNorth architecture as well as compositionality of the corelet programming paradigm and the flexibility of the underlying neuron model.
international symposium on neural networks | 2013
Arnon Amir; Pallab Datta; William P. Risk; Andrew S. Cassidy; Jeffrey A. Kusnitz; Steven K. Esser; Alexander Andreopoulos; Theodore M. Wong; Myron Flickner; Rodrigo Alvarez-Icaza; Emmett McQuinn; Benjamin Shaw; Norm Pass; Dharmendra S. Modha
Marching along the DARPA SyNAPSE roadmap, IBM unveils a trilogy of innovations towards the TrueNorth cognitive computing system inspired by the brains function and efficiency. The sequential programming paradigm of the von Neumann architecture is wholly unsuited for TrueNorth. Therefore, as our main contribution, we develop a new programming paradigm that permits construction of complex cognitive algorithms and applications while being efficient for TrueNorth and effective for programmer productivity. The programming paradigm consists of (a) an abstraction for a TrueNorth program, named Corelet, for representing a network of neurosynaptic cores that encapsulates all details except external inputs and outputs; (b) an object-oriented Corelet Language for creating, composing, and decomposing corelets; (c) a Corelet Library that acts as an ever-growing repository of reusable corelets from which programmers compose new corelets; and (d) an end-to-end Corelet Laboratory that is a programming environment which integrates with the TrueNorth architectural simulator, Compass, to support all aspects of the programming cycle from design, through development, debugging, and up to deployment. The new paradigm seamlessly scales from a handful of synapses and neurons to networks of neurosynaptic cores of progressively increasing size and complexity. The utility of the new programming paradigm is underscored by the fact that we have designed and implemented more than 100 algorithms as corelets for TrueNorth in a very short time span.
ieee international conference on high performance computing data and analytics | 2012
Robert Preissl; Theodore M. Wong; Pallab Datta; Myron Flickner; Raghavendra Singh; Steven K. Esser; William P. Risk; Horst D. Simon; Dharmendra S. Modha
Inspired by the function, power, and volume of the organic brain, we are developing TrueNorth, a novel modular, non-von Neumann, ultra-low power, compact architecture. TrueNorth consists of a scalable network of neurosynaptic cores, with each core containing neurons, dendrites, synapses, and axons. To set sail for TrueNorth, we developed Compass, a multi-threaded, massively parallel functional simulator and a parallel compiler that maps a network of long-distance pathways in the macaque monkey brain to TrueNorth. We demonstrate near-perfect weak scaling on a 16 rack IBM® Blue Gene®/Q (262144 CPUs, 256 TB memory), achieving an unprecedented scale of 256 million neurosynaptic cores containing 65 billion neurons and 16 trillion synapses running only 388x slower than real time with an average spiking rate of 8.1 Hz. By using emerging PGAS communication primitives, we also demonstrate 2x better real-time performance over MPI primitives on a 4 rack Blue Gene/P (16384 CPUs, 16 TB memory).
european conference on computer systems | 2008
Anna Povzner; Tim Kaldewey; Scott A. Brandt; Richard A. Golding; Theodore M. Wong; Carlos Maltzahn
Guaranteed I/O performance is needed for a variety of applications ranging from real-time data collection to desktop multimedia to large-scale scientific simulations. Reservations on throughput, the standard measure of disk performance, fail to effectively manage disk performance due to the orders of magnitude difference between best-, average-, and worst-case response times, allowing reservation of less than 0.01% of the achievable bandwidth. We show that by reserving disk resources in terms of utilization it is possible to create a disk scheduler that supports reservation of nearly 100% of the disk resources, provides arbitrarily hard or soft guarantees depending upon application needs, and yields efficiency as good or better than best-effort disk schedulers tuned for performance. We present the architecture of our scheduler, prove the correctness of its algorithms, and provide results demonstrating its effectiveness.
real time technology and applications symposium | 2006
Theodore M. Wong; Richard A. Golding; Caixue Lin; Ralph A. Becker-Szendy
Large-scale storage systems often hold data for multiple applications and users. A problem in such systems is isolating applications and users from each other to prevent their workloads from interacting in unexpected ways. Another is ensuring that each application receives an appropriate level of performance. As part of the solution to these problems, we have designed a hierarchical I/O scheduling algorithm to manage performance resources on an underlying storage device. Our algorithm uses a simple allocation abstraction: an application or user has a corresponding pool of throughput, and manages throughput within its pool by opening sessions. The algorithm ensures that each pool and session receives at least a reserve rate of throughput and caps usage at a limit rate, using hierarchical token buckets and EDF I/O scheduling. Once it has fulfilled the reserves of all active sessions and pools, it shares unused throughput fairly among active sessions and pools such that they tend to receive the same amount. It thus combines deadline scheduling with proportional-style resource sharing in a novel way. We assume that the device performs its own low-level head scheduling, rather than modeling the device in detail. Our implementation shows the correctness of our algorithm, imposes little overhead on the system, and achieves throughput nearly equal to that of an unmanaged device.
AIAA SPACE 2008 Conference & Exposition | 2008
David LoBosco; Richard A. Golding; Glen Cameron; Theodore M. Wong
In recent years, national security space systems have been plagued by cost overruns, schedule slips, and requirements creep. Notable programs have received much coverage in the press after incurring multiple Nunn-McCurdy cost growth breaches (total program cost increase by greater than 25%) or being cancelled outright. While these programs are the most visible, they are merely examples of the widespread problems with the government’s and industry’s approach to military space. While a spacecraft that is never launched obviously provides a poor return on government investment, those spacecraft that have reached orbit may be equally bad investments if mission needs have changed or a critical component fails soon after launch. Recent studies suggest that fractionated space systems can produce a higher value return on investment than traditional monolithic systems. Rather than continuing to design large monolithic satellites with unrealizable and competing requirements for performance and lifetime, we plan to develop a fractionated space system architecture, informed by value-centric engineering, that enables rapid initial operational capability through staged deployment, flexibility to changing national security needs, and robustness against attack and failures. The Pleiades architecture implements fractionated space systems, where the system’s functionality is spread over multiple, heterogeneous spacecraft modules. Each spacecraft module is a free-flying entity that has its own set of typical spacecraft bus functions but carries a unique mission payload or a resource such as a mission data processor, solid state recorder or high bandwidth downlink. The spacecraft modules fly in a cluster, and communicate with each other through a shared wireless network. The Pleiades architecture allows for rapid response to operational needs, whether it be by launching new spacecraft modules to join a cluster or by retasking existing spacecraft resources. Modules carrying new payloads or resources are produced more quickly and at lower cost than traditional systems by using commercial space best practices, single string designs (system reliability is achieved through redundancy across spacecraft modules), and the benefits of economies of scale. This can lower the barrier to deploying space-based assets for agencies that cannot currently afford them. Also, the cluster is less vulnerable to attack, and can disperse itself when there is an incoming threat. When a failure does happen, the architecture provides for a smooth degradation in capability until a replacement can be launched and brought in to the cluster. The fundamental innovation of this architecture is that information integration, not physical decomposition, is the key to realizing fractionation. In current space systems, the main inhibitor to rapid evolution and adaptation is stovepiped architectures that inhibit movement of information between elements, including the ground, and prevent reconfiguration of information flow to meet evolving mission needs. A fractionated space
petascale data storage workshop | 2007
David O. Bigelow; Suresh Iyer; Tim Kaldewey; Roberto C. Pineiro; Anna Povzner; Scott A. Brandt; Richard A. Golding; Theodore M. Wong; Carlos Maltzahn
Many applications---for example, scientific simulation, real-time data acquisition, and distributed reservation systems---have I/O performance requirements, yet most large, distributed storage systems lack the ability to guarantee I/O performance. We are working on end-to-end performance management in scalable, distributed storage systems. The kinds of storage systems we are targeting include large high-performance computing (HPC) clusters, which require both large data volumes and high I/O rates, as well as large-scale general-purpose storage systems.
Archive | 2005
Richard A. Golding; Theodore M. Wong; Omer Ahmed Zaki
Archive | 2005
Richard A. Golding; Theodore M. Wong; Omer Ahmed Zaki