Andrew Lines
California Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andrew Lines.
conference on advanced research in vlsi | 1997
Alain J. Martin; Andrew Lines; Rajit Manohar; Mika Nyström; Paul I. Pénzes; Robert Southworth; Uri Cummings; Tak Kwan Lee
The design of an asynchronous clone of a MIPS R3000 microprocessor is presented. In 0.6 /spl mu/m CMOS, we expect performance close to 280 MIPS, for a power consumption of 7 W. The paper describes the structure of a high-performance asynchronous pipeline, in particular precise exceptions, pipelined caches, arithmetic, and registers, and the circuit techniques developed to achieve high throughput.
international symposium on microarchitecture | 2004
Andrew Lines
System-on-chip (SoC) designs integrate a variety of cores and I/O interfaces, which usually operate at different clock frequencies. Communication between unlocked clock domains requires careful synchronization, which inevitably introduces metastability and some uncertainty in timing. Thus, any chip with multiple clock domains is already globally asynchronous. We have devised a more elegant and efficient solution to the multiple-clock-domain problem. Instead of gluing synchronous domains directly to each other with clock-domain bridges, we use asynchronous-circuit design techniques to handle all clock-domain crossing as well as all cross-chip communication and routing. The phase-locked loop (PLL) and clock distribution can be entirely local to each synchronous core, easing timing closure and improving the reusability of cores across multiple designs. Our solution, Nexus, is a globally asynchronous, locally synchronous (GALS) interconnect that features a 16-port, 36-bit asynchronous crossbar. The crossbar connects through asynchronous channels to clock-domain converters for each synchronous module. To ensure that Nexus will work robustly in a commercial application, we developed and applied many verification and test strategies, including novel variations of noise analysis, timing analysis, and fault and delay testing.
ieee international symposium on asynchronous circuits and systems | 2006
Peter A. Beerel; Andrew Lines; Mike Davies; Namhoon Kim
Slack matching is the problem of adding pipeline buffers to an asynchronous pipelined design in order to prevent stalls and improve performance. This paper addresses the problem of minimizing the cost of additional pipeline buffers needed to achieve a given performance target. An intuitive analysis is given that is then formalized using marked graph theory. This leads to a mixed integer linear programming (MILP) solution of the problem. Theory is then presented that identifies under what circumstances the MILP solution admits a polynomial time solution. For other circumstances, a polynomial-time approximate algorithm using linear programming is proposed. Experimental results on a large set of benchmark circuits demonstrate the computational feasibility and effectiveness of both approaches
Proceedings of 1994 IEEE Symposium on Advanced Research in Asynchronous Circuits and Systems | 1994
Uri Cummings; Andrew Lines; Alain J. Martin
We derive an asynchronous, delay-insensitive CMOS circuit to implement a finite impulse response lattice structure filter. Simulation indicates a performance in the range of 380 million multiplications and 980 million additions per second in Hewlett-Packards 0.8 /spl mu/m technology (/spl lambda/=0.5 /spl mu/m). We obtain high throughput by using deep pipelines and buffering the carry chains of adders and multipliers. Our work demonstrates that formal design can easily yield circuits which are safe and fast.
IEEE Micro | 2018
Mike Davies; Narayan Srinivasa; Tsung-Han Lin; Gautham N. Chinya; Yongqiang Cao; Sri Harsha Choday; Georgios D. Dimou; Prasad Joshi; Nabil Imam; Shweta Jain; Yuyun Liao; Chit-Kwan Lin; Andrew Lines; Ruokun Liu; Deepak A. Mathaikutty; Steven McCoy; Arnab Paul; Jonathan Tse; Guruguhanathan Venkataramanan; Yi-Hsin Weng; Andreas Wild; Yoonseok Yang; Hong Wang
Loihi is a 60-mm2 chip fabricated in Intels 14-nm process that advances the state-of-the-art modeling of spiking neural networks in silicon. It integrates a wide range of novel features for the field, such as hierarchical connectivity, dendritic compartments, synaptic delays, and, most importantly, programmable synaptic learning rules. Running a spiking convolutional form of the Locally Competitive Algorithm, Loihi can solve LASSO optimization problems with over three orders of magnitude superior energy-delay-product compared to conventional solvers running on a CPU iso-process/voltage/area. This provides an unambiguous example of spike-based computation, outperforming all known conventional solutions.
symposium on asynchronous circuits and systems | 2009
Jonathan Dama; Andrew Lines
This paper details the design of ≫ 1GHz pipelined asynchronous SRAMs in TSMCs 65nm GP process. We show how targeted timing assumptions improve an otherwise quasi delay-insensitive (QDI) design. The speed, area, and power of our SRAMs are compared to commercially available synchronous SRAMs in the same technology. We also present novel techniques for implementing large pseudo dual-ported memories that support simultaneous reads and writes. The most sophisticated of three designs yields a fully provisioned dual-ported memory using multiple single-ported banks connected by dual-ported buses, plus a small side-band memory to avoid bank conflicts. We discuss our solutions for manufacturing defects, soft-errors, and analog robustness with attention to advantages and challenges of our asynchronous methodology. Laboratory measurements of a test-chip demonstrate correct functionality at speeds well over a GHz. Our single-ported SRAM designs are larger but faster than the alternate synchronous designs, while our novel dual-ported implementations can be both smaller and much faster. These technology advantages lead directly to competitive advantages for our future commercial products.
ieee international symposium on asynchronous circuits and systems | 2014
Mike Davies; Andrew Lines; Jon Dama; Alain Gravel; Robert Southworth; Georgios D. Dimou; Peter A. Beerel
The design of a commercially-shipping 72-port 10G Ethernet switch router integrated circuit is presented. The 1.2 billion transistor chip consists of a core of > 1GHz asynchronous circuits surrounded by standard synchronous logic for external interfaces. It is manufactured in a TSMC 65nm process. The asynchronous circuitry includes 15MB of single-ported SRAM, 150KB of dual-ported SRAM, 100KB of TCAM, Tb bandwidth crossbars, and a fully pipelined programmable packet processor processing one billion packets per second. The design implementation relied heavily on a novel tool flow utilizing both commercial and proprietary EDA tools for automatic place-and-route of asynchronous layout.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2014
Georgios D. Dimou; Peter A. Beerel; Andrew Lines
This paper proposes the method of generating asynchronous circuits from hardware description language specifications by clustering the synthesized gates into asynchronous pipeline stages while preserving liveness, meeting throughput and latency constraints, and minimizing area. This method provides a form of automatic pipelining in which the throughput of the overall design is not limited to the clock frequency or the level of pipelining in the original Register-Transfer Level (RTL) specification. The method is design-style agnostic and is thus applicable to many asynchronous design styles.
ieee international symposium on asynchronous circuits and systems | 2013
Jonathan Tse; Andrew Lines
Innovative asynchronous circuits are central to the Ethernet switch chips from Intels Switch and Router Division (formerly Fulcrum Microsystems). These circuits are complex, and it can be hard to gauge their benefits since there are few direct comparisons. For this paper, we apply the technology and tool flow developed for these commercial products to a familiar benchmark: a network of general purpose processors on a chip. The processor is a single-issue 32-bit integer RISC core, a from-scratch implementation mostly compatible with the MIPS R3000. The network uses a 16-port 32-bit fully connected Nexus crossbar. We achieve greater scalability by linking these crossbars in a 2D mesh with clusters of 8 cores and 4 cardinal and 4 diagonal links per tile. Each core has 64KB of local memory and can access the memory of any other core in the mesh. Our design makes heavy use of the Proteus synthesis, place & route flow, as well as existing custom cells. It required only a few man-months of effort to develop a complete gate-level design and physical floor-plan which can run simple C programs such as Dhrystone. A few more man-months will produce a test chip, expected in 2013.
power and timing modeling optimization and simulation | 2011
Georgios D. Dimou; Peter A. Beerel; Andrew Lines
This paper proposes the method of generating asynchronous circuits from hardware description language specifications by clustering the synthesized gates into asynchronous pipeline stages while preserving liveness, meeting throughput and latency constraints, and minimizing area. This method provides a form of automatic pipelining in which the throughput of the overall design is not limited to the clock frequency or the level of pipelining in the original Register-Transfer Level (RTL) specification. The method is design-style agnostic and is thus applicable to many asynchronous design styles.