Is this you? Create Your Porfile

William S. Song

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where William S. Song is active.

Explore More

Publication

Featured researches published by William S. Song.

asilomar conference on signals, systems and computers | 1997

A new 3-GSPS 65-GOPS UHF digital radar receiver and its performance characteristics

William S. Song

A new UHF direct RF sampling digital radar receiver was developed for an airborne early warning application. The digital receiver samples the input directly at RF frequencies with an 8-bit 3 billion samples per second analog-to-digital converter and performs the down-conversion in the digital domain. In order to meet the computational throughput requirement associated with the high-speed digital down-conversion, a 65 billion operations per second full custom signal processor chip-set was developed using the bit-level systolic array architecture. The design considerations and the performance measurements of the prototype digital receiver and its subcomponents are presented.

ieee high performance extreme computing conference | 2017

Static graph challenge: Subgraph isomorphism

Siddharth Samsi; Vijay Gadepally; Michael B. Hurley; Michael Jones; Edward K. Kao; Sanjeev Mohindra; Paul Monticciolo; Albert Reuther; Steven Smith; William S. Song; Diane Staheli; Jeremy Kepner

The rise of graph analytic systems has created a need for ways to measure and compare the capabilities of these systems. Graph analytics present unique scalability difficulties. The machine learning, high performance computing, and visual analytics communities have wrestled with these difficulties for decades and developed methodologies for creating challenges to move these communities forward. The proposed Subgraph Isomorphism Graph Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a graph challenge that is reflective of many real-world graph analytics processing systems. The Subgraph Isomorphism Graph Challenge is a holistic specification with multiple integrated kernels that can be run together or independently. Each kernel is well defined mathematically and can be implemented in any programming environment. Subgraph isomorphism is amenable to both vertex-centric implementations and array-based implementations (e.g., using the Graph-BLAS.org standard). The computations are simple enough that performance predictions can be made based on simple computing hardware models. The surrounding kernels provide the context for each kernel that allows rigorous definition of both the input and the output for each kernel. Furthermore, since the proposed graph challenge is scalable in both problem size and hardware, it can be used to measure and quantitatively compare a wide range of present day and future systems. Serial implementations in C++, Python, Python with Pandas, Matlab, Octave, and Julia have been implemented and their single threaded performance have been measured. Specifications, data, and software are publicly available at GraphChallenge.org.

conference on advanced signal processing algorithms architectures and implemenations | 2001

Adaptive array beamforming with fixed-point arithmetic matrix inversion using Givens rotations

Daniel V. Rabinkin; William S. Song; M. Michael Vai; Huy T. Nguyen

Adaptive array systems require the periodic solution of the well-known w=R1v equation in order to compute optimum adaptive array weights. The covariance matrix R is estimated by forming a product of noise sample matrices X:R=XHX. The operations-count cost of performing the required matrix inversion in real time can be prohibitively high for a high bandwidth system with a large number of sensors. Specialized hardware may be required to execute the requisite computations in real time. The choice of algorithm to perform these computations must be considered in conjunction with the hardware technology used to implement the computation engine. A systolic architecture implementation of the Givens rotation method for matrix inversion was selected to perform adaptive weight computation. The bit-level systolic approach enables a simple ASIC design and a very low power implementation. The bit-level systolic architecture must be implemented with fixed-point arithmetic to simplify the propagation of data through the computation cells. The Givens rotation approach has a highly parallel implementation and is ideally suited for a systolic implementation. Additionally, the adaptive weights are computed directly from the sample matrix X in the voltage domain, thus reducing the required dynamic range needed in carrying out the computations. An analysis was performed to determine the required fixed-point precision needed to compute the weights for an adaptive array system operating in the presence of interference. Based on the analysis results, it was determined that the precision of a floating-point computation can be well approximated with a 13-bit to 19-bit word length fixed point computation for typical system jammer-to-noise levels. This property has produced an order-of-magnitude reduction in required hardware complexity. A synthesis-based ASIC design process was used to generate preliminary layouts. These layouts were used to estimate the area and throughput of the VLSI QR decomposition architecture. The results show that this QR decomposition process, when implemented into a full-custom design, provides a computation time that is two orders of magnitude faster than a state-of-the-art microprocessor.

asilomar conference on signals, systems and computers | 1994

VLSI bit-level systolic array for radar front-end signal processing

William S. Song

A very-high-speed radar front-end signal processing CMOS VLSI chip-set using a fully efficient bit-level systolic array architecture has been developed by MIT Lincoln Laboratory. The chip-set performs baseband quadrature sampling, channel equalization, pulse compression, and digital beamforming. The highly pipelined fully efficient bit-level systolic architecture and the highly optimized scalable CMOS VLSI cell library design give the chip-set extremely high performance. The chip-set uses an efficient 4:1 down-sampling baseband quadrature sampling architecture with reduced computational requirement. The chip-set and the cell library have potential in a variety of applications such as communications and medical imaging.<<ETX>>

ieee high performance extreme computing conference | 2017

Streaming graph challenge: Stochastic block partition

Edward K. Kao; Vijay Gadepally; Michael B. Hurley; Michael Jones; Jeremy Kepner; Sanjeev Mohindra; Paul Monticciolo; Albert Reuther; Siddharth Samsi; William S. Song; Diane Staheli; Steven Smith

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.

asilomar conference on signals, systems and computers | 2000

High-performance low-power polyphase channelizer chip-set

William S. Song; Michael Vai; Huy T. Nguyen; A.H. Horst

A very-high-performance ultra-low-power signal processor chip-set has been developed for wideband adaptive radar and communications applications. The chip-set consists of a polyphase filter chip and a fast Fourier transform (FFT) chip. The chip-set performs polyphase channelization, which channelizes wideband digital data into multiple narrow subbands. The subsequent signal processing tasks such as adaptive beamforming, pulse compression, and space-time adaptive processing (STAP) are performed in the subband domain to mitigate dispersion effects. The power efficiency is achieved through highly optimized VLSI bit-level semi-systolic array technology. The chip-set was fabricated on a 0.25 micron bulk silicon CMOS process with the total two-chip die area of 1.6 square centimeters. The chip-set performs 54 billion arithmetic operations per second on 1.3 watts of power with 41 billion operations per second per watt power efficiency.

asilomar conference on signals, systems and computers | 2001

High-performance low-power bit-level systolic array signal processor with low-threshold dynamic logic circuits

William S. Song; Michael Vai; Huy T. Nguyen

MIT Lincoln Laboratory has developed a scalable full-custom cell library for implementing bit-level systolic array signal processors. The cell library achieves high performance and low power consumption by using dynamic logic circuits with low-threshold voltage CMOS devices. The cell library is designed to implement signal processing functions such as finite impulse response (FIR) filter, infinite impulse response (IIR) filter, polyphase filter bank, fast Fourier transform (FFT), inverse fast Fourier transform (IFFT) and matrix operations such as partial product computation and QR decomposition. The full custom cell library is highly optimized for fast clock speed, small area and low power consumption. The low-threshold-voltage dynamic logic devices allow operation at high clock speeds with significantly reduced power supply voltage. The dynamic logic also greatly reduces the device count. The cell library is designed to scale to smaller fabrication geometries. Design automation is also possible by using customized placement and routing software. A FIR filter test chip has been designed, fabricated and tested on a 0.25 /spl mu/m 2.5 V bulk CMOS process. The clock frequency exceeds 800 MHz running on only 1.3 V power supply; power efficiency up to 250 billion operations/sec/W has been demonstrated using power supply voltage down to 0.4 V.

asilomar conference on signals, systems and computers | 1998

A two trillion operations per second miniaturized mixed signal radar receiver/processor

William S. Song

A new multi-channel UHF miniaturized mixed-signal radar receiver/processor is being developed for an airborne early warning application. Each MCM-based processor/receiver module consists of a high dynamic range, one-stage down-conversion RF receiver an A/D converter and a high performance radar signal processor. The signal processor performs the digital in-phase/quadrature down-conversion, channel equalization, and pulse compression functions. Approximately 60 billion arithmetic operations per second are performed by each module. The 32 module chassis performs approximately two trillion operations per second in one cubic foot of space. In order to meet the high computational throughput requirement, a 23 billion operations per second custom VLSI signal processor was developed using the bit-level systolic array architecture.

ieee high performance extreme computing conference | 2016

Novel graph processor architecture, prototype system, and results

William S. Song; Vitaliy Gleyzer; Alexei Lomakin; Jeremy Kepner

Graph algorithms are increasingly used in applications that exploit large databases. However, conventional processor architectures are inadequate for handling the throughput and memory requirements of graph computation. Lincoln Laboratorys graph-processor architecture represents a rethinking of parallel architectures for graph problems. Our processor utilizes innovations that include a sparse matrix-based graph instruction set, a cacheless memory system, accelerator-based architecture, a systolic sorter, high-bandwidth multidimensional toroidal communication network, and randomized communications. A field-programmable gate array (FPGA) prototype of the new graph processor has been developed with significant performance enhancement over conventional processors in graph computational throughput.

radio and wireless symposium | 2016

Informed MIMO implementation of distributed transmit beamforming for range extension

Christopher S. Hayes; Adam R. Margetts; Carol Martin; Huy L. Nguyen; William S. Song; Jeremy B. Muldavin

Radios are ubiquitous today as embedded air interfaces to smartphones, electronic wearables, sensors, and autonomous systems. In many instances these radios are in close proximity to one another and share a common goal of relaying a message to a distant terminal. Examples include sensor networks, a swarm of UAVs, search and rescue teams, emergency response teams, police and military squads, a group of people beyond cell coverage attempting to send text messages, etc. Currently, the radio resources of the group offer no help when a single member attempts to contact the base. In this paper, we describe the implementation of a new approach to leverage group radio resources to gain a square-law growth in receive power at the base - while simultaneously suppressing a moderate amount of incidental interference at the base radio.

Explore More