Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Charles Luther Johnson.
custom integrated circuits conference | 1989
Dennis Thomas Cox; David Leroy Guertin; Charles Luther Johnson; Bruce George Rudolph; Robert Russell Williams; Ronald A. Piro; Douglas W. Stout
A major problem in VLSI system design is controlling off-chip driver characteristics and skew in clock generation as process parameters, temperature, and supply voltage vary. A control circuit methodology has been developed that senses the relative performance of a CMOS chip and transmits a digitally encoded state to off-chip driver and clock generation circuits to control their operating characteristics
international symposium on microarchitecture | 2011
Jeffrey Douglas Brown; Sandra S. Woodward; Brian Mitchell Bass; Charles Luther Johnson
The IBM Power Edge of Network processor combines the attributes of a general-purpose processing subsystem with function accelerators and networking interfaces to create a system on a chip thats targeted for applications at the edge of network. This article discusses in detail the processing, accelerator, and network interface subsystems and explores applications well suited to the PowerEN processor.
international symposium on low power electronics and design | 2011
Yibo Chen; Eren Kursun; Dave Motschman; Charles Luther Johnson; Yuan Xie
The three-dimensional integrated circuits (3D ICs) offer performance advantages thanks to the increased bandwidth and reduced wire-length enabled by through-silicon-via structures (TSVs). Traditionally TSVs have been considered to improve the thermal conductivity in the vertical direction. However, the lateral thermal blockage effect becomes increasingly important for TSV via farms (a cluster of TSV vias used for signal bus connections between layers). TSV farms can cause different thermal effects on different layers due to the unequal x, y, z thermal conductivities. This can exhibit itself as thermal improvement in the vertical heat flow, at the same time lateral heat blockage effects in thinned pass-through layers. In this paper, we propose a thermal-aware via farm placement technique for 3D ICs to minimize lateral heat blockages caused by dense signal bus TSV structures. By incorporating thermal conductivity profile of via farm blocks in the design flow and enabling placement/aspect ratio optimization, the corresponding hotspots can be minimized within the wire-length and area constraints.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2013
Yibo Chen; Eren Kursun; Dave Motschman; Charles Luther Johnson; Yuan Xie
3-D integrated circuits (3-D ICs) offer performance advantages due to their increased bandwidth and reduced wire-length enabled by through-silicon-via structures (TSVs). Traditionally TSVs have been considered to improve the thermal conductivity in the vertical direction. However, the lateral thermal blockage effect becomes increasingly important for TSV via farms (a cluster of TSV vias used for signal bus connections between layers) because the TSV size and pitch continue to scale in μm range and the metal to insulator ratio becomes smaller. Consequently, dense TSV farms can create lateral thermal blockages in thinned silicon substrate and exacerbate the local hotspots. In this paper, we propose a thermal-aware via farm placement technique for 3-D ICs to minimize lateral heat blockages caused by dense signal bus TSV structures. By incorporating thermal conductivity profile of via farm blocks in the design flow and enabling placement/aspect ratio optimization, the corresponding hotspots can be minimized within the wire-length and area constraints.
design, automation, and test in europe | 2011
Jeonghee Shin; John A. Darringer; Guojie Luo; Alan J. Weger; Charles Luther Johnson
The design of high-performance servers has always been a challenging art. Now, server designers are being asked to explore a much larger design space as they consider multicore heterogeneous architecture and the limits of advancing silicon technology. Bringing automation to the early stages of design can enable more rapid and accurate trade-off analysis. In this paper, we introduce an Early Chip Planner which allows designers to rapidly analyze microarchitecture, physical and package design trade-offs for 2D and 3D VLSI chips and generates an attributed netlist to be carried on to the implementation stage. We also describe its use in planning a 3D special-purpose server processor.
high performance computer architecture | 2012
Augusto Vega; Pradip Bose; Alper Buyuktosunoglu; Jeff H. Derby; Michele M. Franceschini; Charles Luther Johnson; Robert K. Montoye
In wireless networks, base stations are responsible for operating on large amounts of traffic at high speed rates. With the advent of new standards, as 4G, further pressure is put in the hardware requirements to satisfy speeds of up to 1 Gbps. In this work, we study the applicability and potential benefits of the IBM PowerEN processor (a multi-core, massively multithreaded platform) in the realm of base stations for the 3G and 4G standards. The approach involves exploiting the throughput computation capabilities of the PowerEN processor, replacing the bus-attached special-function accelerators with a layer of in-line universal acceleration support, incorporated within the cores. A key feature of this in-line accelerator is a bank-based very-large register file, with embedded SIMD support. This processor-in-regfile (PIR) strategy is implemented as local computation elements (LCEs) attached to each bank, overcoming the limited number of register file ports. Because each LCE is a SIMD computation element, and all of them can proceed concurrently, the PIR approach constitutes a highly-parallel super-wide-SIMD device. To target a broad spectrum of applications for base stations, we also consider a PIR-based architecture built upon reconfigurable LCEs. In this paper, we evaluate the in-line universal accelerator and the PIR strategy focusing on two specific applications for base stations: FFT and Turbo Decoding.
international conference on acoustics, speech, and signal processing | 2013
Dheeraj Sreedhar; Jeff H. Derby; Augusto Vega; B. Rogers; Charles Luther Johnson; Robert K. Montoye
The high speed uplink packet access (HSUPA) wireless standard requires extremely high-performance signal processing in the baseband receiver, the most challenging being the chip rate rake receiver. In this paper we describe the architectural enhancements on the IBMs PowerEN processor, to enable it to support the computational requirements of the rake receiver in a fully programmable and scalable fashion. A key feature of these enhancements is a bank-based very-large register file, with embedded single instruction multiple data (SIMD) support. This processor-in-regfile (PIR) strategy is implemented as local computation elements (LCEs) attached to each bank. This overcomes the limitation on the number of register file ports and at the same time enables high degree of parallelism. We show that these enhancements enable the integration of multi-sector HSUPA G-RAKE receivers on a single processor.
ieee international conference on high performance computing, data, and analytics | 2014
Dheeraj Sreedhar; Jeff H. Derby; Robert K. Montoye; Charles Luther Johnson
Dense matrix-matrix multiply is an important kernel in many high performance computing applications including the emerging deep neural network based cognitive computing applications. Graphical processing units (GPU) have been very successful in handling dense matrix-matrix multiply in a variety of applications. However, recent research has shown that GPUs are very inefficient in using the available compute resources on the silicon for matrix multiply in terms of utilization of peak floating point operations per second (FLOPS). In this paper, we show that an architecture with a large register file supported by “indirection ” can utilize the floating point computing resources on the processor much more efficiently. A key feature of our proposed in-line accelerator is a bank-based very-large register file, with embedded SIMD support. This processor-in-regfile (PIR) strategy is implemented as local computation elements (LCEs) attached to each bank, overcoming the limited number of register file ports. Because each LCE is a SIMD computation element, and all of them can proceed concurrently, the PIR approach constitutes a highly-parallel super-wide-SIMD device. We show that we can achieve more than 25% better performance than the best known results for matrix multiply using GPUs. This is achieved using far lesser floating point computing units and hence lesser silicon area and power. We also show that architecture blends well with the Strassen and Winograd matrix multiply algorithms. We optimize the selective data parallelism that the LCEs enable for these algorithms and study the area-performance trade-offs.
Archive | 1991
Charles Luther Johnson; Robert Francis Lembach; Bruce George Rudolph; Robert Russel Williams
Archive | 1989
Joseph J. Cahill; Charles Luther Johnson; Steven Dean Lewis; Timothy John Mullins; Bruce Ralph Petz
