Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jinuk Luke Shin is active.

Publication


Featured researches published by Jinuk Luke Shin.


IEEE Journal of Solid-state Circuits | 2011

A 40 nm 16-Core 128-Thread SPARC SoC Processor

Jinuk Luke Shin; Dawei Huang; Bruce Petrick; Changku Hwang; Kenway Tam; Alan P. Smith; Ha Pham; Hongping Penny Li; Timothy Johnson; Francis Schumacher; Ana Sonia Leon; Allan Strong

This fourth generation UltraSPARC T3 SoC processor implements sixteen 8-threaded SPARC cores to double on-chip thread count and throughput performance over its previous generation. It enhances glueless scalability to enable up to 512 threads in a 4-way system. A 16-Bank 6 MB L2 Cache, a 512 GB/s hierarchical crossbar and a 312-lane SerDes I/O of 2.4 Tb/s support the bandwidth required by the large number of threads. This SoC processor integrates the memory controller, PCIE 2.0, 10 Gb Ethernet ports, and required cache coherency support in multi-chip configurations. Multiple clock and power domains are used to optimize performance and power for the SoC components. Extensive power management features, from architecture to circuit techniques, optimize both active and idle power. The 377 die includes 1 billion transistors in a flip-chip ceramic package with 2117 pins. The chip is fabricated in TSMCs 40 nm high-performance process with 11 Cu metals and four transistor types.


international solid-state circuits conference | 2012

The next-generation 64b SPARC core in a T4 SoC processor

Jinuk Luke Shin; Robert T. Golla; Hongping Penny Li; Sudesna Dash; Youngmoon Choi; Alan P. Smith; Harikaran Sathianathan; Mayur Joshi; Heechoul Park; Mohamed Elgebaly; Sebastian Turullols; Song Kim; Robert P. Masleid; Georgios K. Konstadinidis; Mary Jo Doherty; Greg Grohoski; Curtis McAllister

The T4 microprocessor introduces the next generation dual-issue, out-of-order SPARC core that delivers up to 5x integer and 7x floating-point single-thread performance improvement for both commercial and industry standard work- loads. Eight SPARC cores, a crossbar and a unified 16-way 4MB L3 cache are implemented in the same system-on-chip platform as the predecessor T3 to utilize established coherency (CLC), DDR3 (MCU), PCIE Gen2 (PEU) and 1G/10G Ethernet interfaces (NIL)). Further, T4s pin, thermal and power compatibility with the previous generation enables faster time to market for new multi-socket systems. The 403mm2 die has 855 million transistors of four different types and 12 metal layers fabricated using TSMCs 40nm process.


international solid-state circuits conference | 2013

A 3.6 GHz 16-Core SPARC SoC Processor in 28 nm

Jason M. Hart; Hoyeol Cho; Yuefei Ge; Gregory Gruber; Dawei Huang; Changku Hwang; Daisy Jian; Timothy Johnson; Georgios K. Konstadinidis; Venkatram Krishnaswamy; Lance Kwong; Robert P. Masleid; Rakesh Mehta; Umesh Nawathe; Harikaran Sathianathan; Yongning Sheng; Jinuk Luke Shin; Sebastian Turullols; Zuxu Qin; King C. Yen

The 3.6 GHz SPARC T5 processor is Oracles next generation CMT SoC processor implemented in TSMCs 28 nm process with 1.5 billion transistors. Significant performance improvements were made by doubling the previous generations number of cores to 16 and L3 cache size to 8 MB while increasing bandwidth by nearly 3×. Power efficiency was improved through features like DVFS, core-pair cycle skipping and SerDes power scaling. The SPARC T5 processor has been designed to fit in systems that can scale from 1 to 8 sockets, or 128 to 1024 threads, in glueless fashion. The diverse system-level bandwidth requirements of up to 5.65 TB/sec in these systems are met by advanced SERDES design that handles up to 30 dB loss in an area and power efficient manner. The different thermal envelopes of these systems are addressed by power management features that span software, system and chip design.


international solid-state circuits conference | 2015

4.3 Fine-grained adaptive power management of the SPARC M7 processor

Venkatram Krishnaswamy; Jeffrey S. Brooks; Georgios K. Konstadinidis; Curtis McAllister; Ha Pham; Sebastian Turullols; Jinuk Luke Shin; Yifan YangGong; Haowei Zhang

The power management system described in this paper enables more than 3× increase in power-constrained performance over the previous generation of SPARC server CPUs [2]. The low latency and high performance of the system is possible due to accurate, high-bandwidth sensors, fast on-die control and finegrained actuation implemented using both clock cycle skipping and DVFS, as required by the time constants of system constraints.


international solid-state circuits conference | 2015

4.2 A 20nm 32-Core 64MB L3 cache SPARC M7 processor

Penny Li; Jinuk Luke Shin; Georgios K. Konstadinidis; Francis Schumacher; Venkatram Krishnaswamy; Hoyeol Cho; Sudesna Dash; Robert P. Masleid; Chaoyang Zheng; Yuanjung David Lin; Paul Loewenstein; Heechoul Park; Vijay Srinivasan; Dawei Huang; Changku Hwang; Wenjay Hsu; Curtis McAllister

The SPARC M7 processor delivers more than 3x throughput performance improvement over its predecessor SPARC M6 for commercial applications. It introduces new design features, such as the S4 core, a 64MB L3 cache subsystem with application data integrity, a low-latency, high-throughput on-chip network (OCN), a database analytic accelerator (DAX), fine-grain adaptive power management and 1.5× higher SerDes I/O bandwidth for memory, coherency and system interfaces (Fig. 4.2.1) [1]. The enhancements in the S4 core over the S3 core [2] include a new L2 cache scheme, support for visual instruction set (VIS) extensions, virtual address masking and user-level synchronization instructions to provide continuous single-thread performance improvement for SPARC processors since SPARC T4. In addition, a hierarchical modular approach, called SPARC cache cluster (SCC), is used for the core-L2-L3 cache system. Within the SCC, all four cores share a single 256KB L2 instruction cache and each core pair has its own 256KB L2 data cache. The L2 caches are organized as 2-banks and 8-ways to deliver greater than 1TB/s bandwidth to the four cores. This L2 system delivers 2× more throughput for each core with 1.5x increase in size and the same latency as the previous generation L2 cache scheme. The L2 caches connect to an 8MB, 8-way set-associative partitioned L3 cache. Having a localized L3 cache within each SCC reduces L3 latency by 25%. The chip contains eight SCCs for a total of 32-cores with 256 threads and a 64MB L3 cache with 1.6TB/S bandwidth. In order to support the bandwidth and latency requirements from 256 threads and other system agents, the OCN architecture is implemented in place of a crossbar based network used in previous SPARC processors. Each SCC connects to the OCN, which in turn connects to four on-chip memory controllers (MCUs), coherency systems and eight database analytic accelerator (DAX) engines. The SPARC M7 introduces a customized DAX engine in an effort to optimize performance for Oracle databases. Eight DAX engines handle simple query predicates, decompression, message passing and interrupts across cluster nodes. This query accelerator provides up to 10x better performance for single stream decompression.


asian solid state circuits conference | 2014

Asymmetric Frequency Locked Loop (AFLL) for adaptive clock generation in a 28nm SPARC M6 processor

Yifan YangGong; Sebastian Turullols; Daniel Woo; Changku Huang; King C. Yen; Venkatram Krishnaswamy; Kalon S. Holdbrook; Jinuk Luke Shin

In order to minimize the impact of on-chip Ldi/dt noise on power and performance, Oracles SPARC M6 processor features an Asymmetric Frequency Locked Loop (AFLL) that dynamically adjusts chip frequency. It achieves 15% improved noise immunity by reacting to the voltage noise asymmetrically through the use of a pair of DCOs that accurately track the response of critical paths. The AFLL is implemented in 28nm CMOS process in 0.045mm2 of area, dissipating 14mW, and reducing jitter by 50%.


international solid-state circuits conference | 2013

Bandwidth and power management of glueless 8-socket SPARC T5 system

Venkatram Krishnaswamy; Dawei Huang; Sebastian Turullols; Jinuk Luke Shin

Continuous advancement in multicore and multi-threaded design requires optimized integration of hardware and software to address increasing bandwidth and power management challenges for high-end system designs. The next generation Oracle T-series systems utilizing the SPARC T5 processor address these challenges. These systems scale from one to eight sockets using a 1-hop glueless connection. The processor implements 16 8-threaded cores, an 8MB L3 cache, four on-chip memory controllers and two on-chip PCIE Gen 3 interfaces [1]. The 8-socket system comprises an unprecedented 1024 threads to deliver the highest thread count ever in any T-series system. The fully configured 8-socket T5 system supports DDR3-1066-based memory bandwidth, which reaches over 2.9TB/s, coherence bandwidth of 2+TB/s and PCI Gen 3 bandwidth with 256GB/s to deliver 5+TB/s throughput (Fig. 3.7.1).


asian solid state circuits conference | 2010

A 40nm 16-core 128-thread SPARC ® SoC processor

Jinuk Luke Shin; Dawei Huang; Bruce Petrick; Changku Hwang; Ana Sonia Leon; Allan Strong

This fourth generation UltraSPARC T3 SoC processor (code named Rainbow Falls) implements sixteen 8-threaded SPARC cores to double on-chip thread count and throughput performance over its previous generation. It enhances glueless scalability to enable up to 512 threads in a 4-way system configuration. The 16-Bank 6MB L2 Cache, the 512GB/s hierarchical crossbar and the 312-lane SerDes I/O of 2.4Tb/s, support the required high bandwidth. This SoC processor integrates a memory controller, PCIE 2.0, 10Gb Ethernet ports, and support for coherency. Multiple clock and power domains optimize performance and power for the SoC components. Extensive power management features, from architecture to circuit techniques, minimize both active and idle power. The 377mm2 die includes 1 billion transistors in a flipchip ceramic package with 2117 pins. The chip is fabricated in TSMCs 40nm high performance process with 11 Cu metals and four transistor types.


asian solid state circuits conference | 2012

The 3.0GHz 64-thread SPARC T4 processor

Jinuk Luke Shin; Robert T. Golla; Hongping Li; Sudesna Dash; Mary Jo Doherty; Greg Grohoski; Curtis McAllister

The SPARC T4 processor introduces the next generation multi-threaded S3 core and delivers a significant single-thread performance improvement over its predecessor. The chip integrates eight S3 cores, an 8-Bank 4MB L3 Cache, a 768GB/sec crossbar, a memory controller, PCI Gen2.0, 10G Ethernet and a cache coherency controller with 2.4Tb/s highspeed I/Os. The dual-issue, out-of-order execution core features a 16-stage integer pipeline, extensive branch predictions, dynamic threading and an enhanced cryptographic processing unit. The 406mm2 die is fabricated in TSMCs 40nm process and contains 855million transistors and 2.6million flip-flops in a flipchip ceramic package. Enhanced physical design methodologies and extensive power management features enable 3.0GHz operation in the same power envelope of its predecessor.


asian solid state circuits conference | 2013

A 28nm 3.6GHz 128 thread SPARC T5 processor and system applications

Venkatram Krishnaswamy; Jinuk Luke Shin; Sebastian Turullols; Jason M. Hart; Georgios K. Konstadinidis; Dawei Huang

The SPARC T5 processor implements 16 8-threaded SPARC S3 cores, an 8-MB 16-way set-associative L3 cache, 8 BL8 DDR3-1066 schedulers, and integrated PCIe Gen-3. The processor doubles the performance of the previous generation SPARC T4 CPU and expands support for up to 8 socket systems in a single hop glueless fashion. It is implemented in the TSMC 28nm process using 1.5 billion transistors and a 13 layer metal stack. The chip has a maximum operating frequency of 3.6 GHz.

Collaboration


Dive into the Jinuk Luke Shin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge