Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Cliff Young is active.

Publication


Featured researches published by Cliff Young.


international symposium on computer architecture | 2017

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi; Cliff Young; Nishant Patil; David Patterson; Gaurav Agrawal; Raminder Bajwa; Sarah Bates; Suresh Bhatia; Nan Boden; Al Borchers; Rick Boyle; Pierre-luc Cantin; Clifford Chao; Christopher D. Clark; Jeremy Coriell; Mike Daley; Matt Dau; Jeffrey Dean; Ben Gelb; Tara Vazir Ghaemmaghami; Rajendra Gottipati; William John Gulland; Robert Hagmann; C. Richard Ho; Doug Hogberg; John Hu; Robert Hundt; Dan Hurt; Julian Ibarz; Aaron Jaffey

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPUs deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X–30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X–80X higher. Moreover, using the GPUs GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.


Physics of Plasmas | 2012

D-T gamma-to-neutron branching ratio determined from inertial confinement fusion plasmasa)

Y. Kim; J. M. Mack; H. W. Herrmann; Cliff Young; Gerry Hale; S. E. Caldwell; Nelson M. Hoffman; Steve Evans; T. J. Sedillo; A. McEvoy; James R. Langenbrunner; H. H. Hsu; M. A. Huff; S. H. Batha; C. J. Horsfield; M. S. Rubery; Warren Garbett; W. Stoeffl; E. Grafil; Lee Allen Bernstein; J. A. Church; D. B. Sayre; M. Rosenberg; C. Waugh; H. G. Rinderknecht; M. Gatu Johnson; A. Zylstra; J. A. Frenje; D. T. Casey; R. D. Petrasso

A new deuterium-tritium (D-T) fusion gamma-to-neutron branching ratio [3H(d,γ)5He/3H(d,n)4He] value of (4.2u2009±u20092.0)u2009×u200910−5 was recently reported by this group [Y. Kim et al. Phys. Rev. C (submitted)]. This measurement, conducted at the OMEGA laser facility located at the University of Rochester, was made for the first time using inertial confinement fusion (ICF) plasmas. Neutron-induced backgrounds are significantly reduced in these experiments as compared to traditional beam-target accelerator-based experiments due to the short pulse nature of ICF implosions and the use of gas Cherenkov γ-ray detectors with fast temporal responses and inherent energy thresholds. It is expected that this ICF-based measurement will help resolve the large and long-standing inconsistencies in previously reported accelerator-based values, which vary by a factor of approximately 30. The reported value at ICF conditions was determined by averaging the results of two methods: (1) a direct measurement of ICF D-T γ-ray and neutron ...


Physics of Plasmas | 2013

Measurement of areal density in the ablators of inertial-confinement-fusion capsules via detection of ablator (n, n′γ) gamma-ray emission

Nelson M. Hoffman; H. W. Herrmann; Y. Kim; H. H. Hsu; C. J. Horsfield; M. S. Rubery; E.K. Miller; E. Grafil; W. Stoeffl; J. A. Church; Cliff Young; J. M. Mack; D. C. Wilson; James R. Langenbrunner; Steve Evans; T. J. Sedillo; V. Yu. Glebov; T. Duffy

We report the first gamma-ray-based measurements of the areal density of ablators in inertial-confinement-fusion capsule implosions. The measurements, made at the OMEGA laser [T. R. Boehly et al., Opt. Commun. 133, 495 (1997)], used observations of gamma rays arising from inelastic scattering of 14.1-MeV deuterium-tritium (DT) neutrons on 12C nuclei in the compressed plastic ablators. The emission of 12C(n,n′γ) gamma rays from the capsules is detected using the Gamma Reaction History instrument [H. W. Herrmann et al., J. Phys.: Conf. Ser. 244, 032047 (2010)] operating at OMEGA. From the ratio of a capsules 12C(n,n′γ) emission to the emission from the same processes in an in situ reference graphite “puck” of known mass and geometry [N. M. Hoffman et al., in IFSA 2011 proceedings (submitted)], we determine the time-averaged areal density of 12C in the capsules compressed ablator. Measured values of total ablator areal density for thirteen imploded capsules, in the range 23u2009±u200910 to 58u2009±u200914u2009mg/cm2, are comp...


IEEE Micro | 2018

A New Golden Age in Computer Architecture: Empowering the Machine-Learning Revolution

Jeffrey Dean; David Patterson; Cliff Young

The end of Moores law and Dennard scaling has led to the end of rapid improvement in general-purpose program performance. Machine learning (ML), and in particular deep learning, is an attractive alternative for architects to explore. It has recently revolutionized vision, speech, language understanding, and many other fields, and it promises to help with the grand challenges facing our society. The computation at its core is low-precision linear algebra. Thus, ML is both broad enough to apply to many domains and narrow enough to benefit from domain-specific architectures, such as Googles Tensor Processing Unit (TPU). Moreover, the growth in demand for ML computing exceeds Moores law at its peak, just as it is fading. Hence, ML experts and computer architects must work together to design the computing systems required to deliver on the potential of ML. This article offers motivation, suggestions, and warnings to computer architects on how to best contribute to the ML revolution.


Communications of The ACM | 2018

A domain-specific architecture for deep neural networks

Norman P. Jouppi; Cliff Young; Nishant Patil; David A. Patterson

Tensor processing units improve performance per watt of neural networks in Google datacenters by roughly 50x.


arXiv: Computation and Language | 2016

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Yonghui Wu; Mike Schuster; Zhifeng Chen; Quoc V. Le; Mohammad Norouzi; Wolfgang Macherey; Maxim Krikun; Yuan Cao; Qin Gao; Klaus Macherey; Jeff Klingner; Apurva Shah; Melvin Johnson; Xiaobing Liu; Łukasz Kaiser; Stephan Gouws; Yoshikiyo Kato; Taku Kudo; Hideto Kazawa; Keith Stevens; George Kurian; Nishant Patil; Wei Wang; Cliff Young; Jason Smith; Jason Riesa; Alex Rudnick; Oriol Vinyals; Greg Corrado; Macduff Hughes


Physical Review C | 2012

Determination of the deuterium-tritium branching ratio based on inertial confinement fusion implosions

Y. Kim; J. M. Mack; H. W. Herrmann; Cliff Young; Gerry Hale; S. E. Caldwell; Nelson M. Hoffman; Steve Evans; T. J. Sedillo; A. McEvoy; James R. Langenbrunner; H. H. Hsu; M. A. Huff; S. H. Batha; C. J. Horsfield; M. S. Rubery; Warren Garbett; W. Stoeffl; E. Grafil; Lee Allen Bernstein; J. A. Church; D. B. Sayre; M. Rosenberg; C. Waugh; H. G. Rinderknecht; M. Gatu Johnson; A. Zylstra; J. A. Frenje; D. T. Casey; R. D. Petrasso


neural information processing systems | 2018

Deep Learning for Supercomputers: Distributed Tensor Layouts Define Distributed Computation

Noam Shazeer; Youlong Cheng; Niki Parmar; Dustin Tran; Ashish Vaswani; Penporn Koanantakool; Peter Hawkins; HyoukJoong Lee; Mingsheng Hong; Cliff Young; Ryan Sepassi; Blake Hechtman


IEEE Micro | 2018

Motivation for and Evaluation of the First Tensor Processing Unit

Norman P. Jouppi; Cliff Young; Nishant Patil; David Patterson


Bulletin of the American Physical Society | 2012

Detection of D-

Y. Kim; H. W. Herrmann; J. M. Mack; Cliff Young; Gerry Hale; Steve Evans; T. J. Sedillo; A. Cahill; C. J. Horsfield; Rubery; E. Grafil; W. Stoeffl; C. Waugh; H. G. Rinderknecht; J. A. Frenje; R. D. Petrasso; E. Kirk Miller

Collaboration


Dive into the Cliff Young's collaboration.

Top Co-Authors

Avatar

J. M. Mack

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Steve Evans

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar

H. W. Herrmann

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar

T. J. Sedillo

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar

C. J. Horsfield

Atomic Weapons Establishment

View shared research outputs
Top Co-Authors

Avatar

D. C. Wilson

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar

James R. Langenbrunner

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

W. Stoeffl

Lawrence Livermore National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Y. Kim

Los Alamos National Laboratory

View shared research outputs
Researchain Logo
Decentralizing Knowledge