[PDF] A Retrospective Recount of Computer Architecture Research with a Data-Driven Study of Over Four Decades of ISCA Publications

Abstract

This study began with a research project, called DISCvR, conducted at the IBM-ILLINOIS Center for Cognitive Computing Systems Reseach. The goal of DISCvR was to build a practical NLP based AI pipeline for document understanding which will help us better understand the computation patterns and requirements of modern computing systems. While building such a prototype, an early use case came to us thanks to the 2017 IEEE/ACM International Symposium on Microarchitecture (MICRO-50) Program Co-chairs, Drs. Hillery Hunter and Jaime Moreno. They asked us if we can perform some data-driven analysis of the past 50 years of MICRO papers and show some interesting historical perspectives on MICRO's 50 years of publication. We learned two important lessons from that experience: (1) building an AI solution to truly understand unstructured data is hard in spite of the many claimed successes in natural language understanding; and (2) providing a data-driven perspective on computer architecture research is a very interesting and fun project. Recently we decided to conduct a more thorough study based on all past papers of International Symposium on Computer Architecture (ISCA) from 1973 to 2018, which resulted this article. We recognize that we have just scratched the surface of natural language understanding of unstructured data, and there are many more aspects that we can improve. But even with our current study, we felt there were enough interesting findings that may be worthwhile to share with the community. Hence we decided to write this article to summarize our findings so far based only on ISCA publications. Our hope is to generate further interests from the community in this topic, and we welcome collaboration from the community to deepen our understanding both of the computer architecture research and of the challenges of NLP-based AI solutions.

Full PDF

AA Retrospective Recount of ComputerArchitecture Research with a Data-Driven Studyof Over Four Decades of ISCA Publications

Omer Anjum Wen-Mei Hwu Jinjun XiongIBM-ILLINOIS Center for Cognitive Computing Systems ResearchJune 25, 2019

The ACM/IEEE International Symposium on Computer Architecture (ISCA)conference is one of the premier forums for presenting, debating and advanc-ing new ideas and experimental results in computer architecture. Accordingto recent calls for papers, research topics are solicited on a broad range ofareas, including, but not limited to: processors, memories, storage systems,architectures, interconnection networks, instructions, thread-level parallelism,data-level parallelism, dependable architectures, architecture support for paral-lel software development, architecture support for security, power and energyeﬃcient architectures, application speciﬁc architectures, reconﬁgurable architec-tures, embedded architectures, network and router architectures, architecturesfor emerging technologies, architecture modeling and performance evaluation.

Data CollectionPhrase MiningAll AbstractsName of Conference&Year of inauguration Abstracts per time span Count Abstracts per Phrase

Extracted Phrases

Sort Phrases byCount Find Change betweenAdjacent SpansPhrases fromAdjacent Spans

Top Phrases Per SpanDelta betweenAdjacent Spans

DisambiguateAcademia & Industry

First AuthorAffiliation

Count Abstracts per Affiliationper Year

Contribution Trends per year fromAcademia & Industry

Figure 1: An Overview of the PipelineRecently we conducted a study of some notable publication trends for ISCAfrom 1973, when it was inaugurated, to 2018. The main questions we weretrying to answer was how the topics and thus the community interests evolvedover these 45 years. Our data set includes all the abstracts of papers publishedin the conference. The source of our data set is Microsoft Academic Graph [1].1 a r X i v : . [ c s . A R ] J un he pipeline we developed for producing the results is shown in Figure 1. Wewill discuss key features and current limitations of the pipeline in a later sectionof this blog. We plan to make the pipeline available as an open source projectto enable similar studies for other conferences and domains. We ﬁrst show the number of papers published at ISCA in Figure 2, where weobserve an increase in the number of papers published in 1980s and then in2010s. Highest peak is observed in 1992 and 2018. It would be interesting tosee if the number of papers published at ISCA would decline in the coming yearscompleting another cycle. In Figure 3 we observe that the average number ofauthors per paper has also increased over time. According to the trends in thepast it appears that the existing trend may continue for few more years and theaverage number of authors per submission will likely increase. In Figure 4, wesee that the there is a long-term trend in decline of papers by ﬁrst authors fromindustry. We believe that this trend is caused by two important forces. First,there is a decline of publication-oriented research activities in the industry infavor of product development. Second, the amount of “paper engineering” eﬀortto write a competitive paper for acceptance by ISCA has been increasing overtime. While graduate students can spend the eﬀort, it is not clear if industryresearchers can justify such eﬀorts.We then make a few observations on longitudinal trends in the ISCA com-munity. In Figure 5, we see three types of long-term patterns. The ﬁrst typeof topics, such as data ﬂow , RISC , parallel processing , instruction-level parallel-sim , multicore processor , microprogramming , branch prediction , shared memory ,receive intensive coverage for a limited period of time. We suspect the inter-ests in these topics subside as the topics are considered either mature or nolonger of interest. Please note that the scale of the y-axis (the number of pa-per abstracts mentioning the topic) varies among topics. The second type oftopics, such as memory access , speculate , operating systems , programming lan-guages , cache coherence , CPU , and memory systems , receives periodic surge ofcoverage. Because these topics are about architecture support for key parts ofcomputing systems, they drew research interests when the industry migrates toa new technology or new paradigm. The third type, such as

GPU , power con-sumption , FPGA , energy eﬃciency , DRAM , has received increasing coverage inrecent years. Although they may belong to the ﬁrst type when we look back afew years from now, it is too early to tell.

Each ﬁgure in this section represents a cloud of representative phrases thatappeared in abstracts for a span of ﬁve years, except that the ﬁrst and lastﬁgures cover a span of 3 years. We count the number of papers in which we ﬁnda match to our query phrase in the abstracts. We expect to have some smallbut potentially signiﬁcant errors in the calculated weight for the phrases dueto using only abstracts and some missing semantic relations between phrases inour natural language processing pipeline.2igure 2: Number of Papers Each YearFigure 3: Average Number of Authors in a Paper3igure 4: Percentage of industry vs. academia aﬃliation of ﬁrst authors

The 1960s were the formative years of the computer industry, when only a fewcompanies such as Burroughs, IBM, Control Data, UNIVAC, Wang Laborato-ries and Digital Equipment Corporation were able to produce products thatwere programmed in machine language and/or early high-level languages suchas FORTRAN and COBOL. Later in 1970s, we see more industry players joiningthem such as CRAY Research. During this time, many of the languages thatwe use today were developed. While high-level programming languages andcompilers were being developed to improve the productivity, earlier eﬀorts tomap high-level languages to stored-program machines, commonly referred as thevon Neumann architecture, resulted in high memory usage and long instructionsequences. Researchers began to propose designs where the programming lan-guages are supported with hardware features. As shown in Figure 6, the phrases programming language and hardware implementation were the No. 1 and No.2 phrases during the period from 1973 to 1975. The concept of Instruction SetArchitecture (ISA) was pioneered by IBM in the 1960s. The same ISA thatis implemented across multiple hardware generations allows the same softwareto work across diﬀerent machine generations. Binary codes from earlier IBMmodels such as 1401, 7040 and 7094 were also able to run unchanged on Sys-tem/360 series. During that time, microprogramming was the primary vehiclefor a processor to interpret the instructions of the ISA and execute them on thenative hardware. The success of the IBM System/360 product line solidiﬁed therole of ISA and microprogramming in the computer architecture community.4igure 5: A long-term history of selected top topics5 a) Word cloud visualization of topics - 1973-75(b) Top topics 1973-75 (c) Topics with the most change incoverage from 1973-75 to 1976-80

Figure 6: Phrases and trends, 1973-1975.In the early 1970’s, numerous designs were proposed to improve eﬃciencyof running programs written in those high-level languages, but only a handfulwere actually implemented [2]. Burroughs E-mode machines for Algol 60, Bur-roughs B2000 for COBOL, LISP machines and Intel 432 for Ada are some ofthose examples. A number of ideas based on microprogramming can be foundin the ISCA papers published in those years that underline these eﬀorts, whichis reﬂected as the No. 2 ranking of the phrase microprogramming . By 1969, thedebate over virtual memory also concluded when IBM showed a consistent per-formance of their virtual memory overlay system over manual system. This wasalso reﬂected in the ISCA publication during that time, where virtual memorywas one of the top trends as shown in Figure 6.As more industry players design computer products in the 1970s, there was6 a) Word cloud visualization of topics - 1976-80(b) Top topics 1976-80 (c) Topics with the most changes incoverage from 1976-80 to 1981-85

Figure 7: Top Phrases and Trends, 1976-807lso an increased interest in developing new operating systems and providingimproved support for operating systems that oﬀer reliable, secure, and usableenvironments for the users. As a result, as shown in Figure 6c for the late 1970s,the ISCA topics in operating systems, virtual memory and memory systemshas also increased. Operating systems continues to be one of the top trendin late 1970s as shown in Figure 7. There was also a burgeoning interest inparallel computing with topics like data ﬂow, parallel processing, and processingelements. Both types of topics would increase in later years.During 1970s, the computer industry went through a decade of innovationin the mini-computer movement. Companies such as Digital Equipment Corpo-ration introduced mini-computers that are aﬀordable by smaller companies andacademic departments. Researchers and engineers can access these minicomput-ers through Cathode-Ray Tube (CRT) terminals on their desks rather than thecard punchers in computing centers, which greatly improved their productivity.These minicomputers have new instruction sets that were implemented with mi-crocode, which further stimulated the coverage of topics like CPU, instructionset, machine language, instruction execution, low cost, microprogramming, andwriteable control.With more accessibility to researchers, these mini-computers also acceler-ated the development of high-level languages. With the poor performance ofhigh-level language implementations, one of the lessons learned was that it wasnot only about the language but also about computation, algorithm and mem-ory access. Researchers began to investigate how these facets of a program canalso be reﬂected in the ISA, which was implemented by a native hardware mi-croarchitecture through microprogramming. Additionally, the desire to bettersupport operating systems further motivated the introduction of sophisticatedinstructions that help data movement, security, and reliability. Previously, thecontrol unit in a processor was hardwired as combinational logic and ﬁnite statemachine. However, the need for more complex and powerful instructions madeit diﬃcult, time consuming and costly to design such hardwired processors. Andsince there were also very few CAD tools for hardware development and veri-ﬁcation, this path was less productive in the late 1970s. This, in part, furthercontributed to the popularity of microprogramming at that time. Microcodesimpliﬁed the processor design by allowing the implementation of control logicas a microcode routine stored in the memory instead of a dedicated circuit.The VAX ISA by Digital Equipment Corporation consisted of more than 300instructions in its ISA. A number of complex instructions were introduced tosupport high-level languages and operating systems to bridge the semantic gap.However, it was observed that compilers rarely used those instructions and theseinstructions were kept just to support “legacy codes” in low-level libraries. TheVAX polynomial-evaluate and CALL instructions are examples of such instruc-tions. As shown in Figure 7, the period between 1976 and 1980 was the timewhen CPU design and microprogramming were at their peak. Programminglanguages, operating systems, processing element, and fault tolerance continueto receive strong attention.

In the 1980s, the computer architecture community started to embrace parallelprocessing and high-performance computing. As shown in Figure 8, a great deal8 a) Word cloud visualization of topics - 1981-85(b) Top topics 1981-85 (c) Topics with the most change incoverage from 1981-85 to 1986-90

Figure 8: Top topics and trends, 1981-85of attention by the ISCA community was paid to the interconnection network be-tween processing elements. The driving applications behind those systems werethe military and scientiﬁc applications including image processing, astrophysicsand weather prediction. Individual processors were not capable of providing therequired computation speed at that time.In industry, mini-supercomputers from Seqent and Alliant began to gain pop-ularity. There was also a burgeoning interest in massively parallel systems suchas Connection Machines from Kendal Square Research. Three major compo-nents of those multiprocessing systems were processing elements, shared memoryand interconnection network. In order to make eﬃcient use of multiprocessingsystems, it required ensuring a communication network that does not become9he bottleneck. A number of ideas were introduced in ISCA publications aroundcircuit switched networks, packet switched networks, multi stage networks, bi-nary tree networks, bus traﬃc, resource scheduling and routing protocols. Thisis reﬂected in the No. 1 and No. 4 rankings of the interconnect network and parallel processing topics during early 1980s as shown in Figure 8. During thisperiod, challenges in parallel programming have also motivated research in dataﬂow architectures and data-driven execution where researchers proposed hard-ware mechanisms to identify instructions that are ready for execution and arescheduled for execution as soon as possible. This is reﬂected in the No. 2ranking of the data ﬂow topic during this period.Supporting shared memory in a parallel processing system has its own chal-lenges, including system scalability to thousands of processors without dramat-ically increasing the memory access latency and memory interference. Levels ofmemory hierarchies were proposed to include caches for perfecting and reusingthe data, which further required developing coherence protocols and consistencymodels to reduce the burden on programmers in managing data values acrossthe system. This is reﬂected in the signiﬁcant increase in related topic coveragefrom 1981-85 to 1986-90, as shown in Figure 8(c). We also observe that, asshown in Figure 9 and Figure 10, this trend continued even into the early 1990s.In the 1980s, the RISC movement, started with the CRAY-1 machine andthe IBM 801 microprocessor project, advocated ISAs with simple instructions,which resulted in uniform instructions that are easier to decode and pipeline.The debate about the advantages and disadvantages of simpler instructionsmade

ISA one of the top topics during the period of 1986-90, as shown inFigure 9.The RISC designs matched well with the transistor budget of the micropro-cessor chips during this time. Industry companies began to produce chips basedon new ISAs like SPARC by SUN Microsystems, MIPS by MIPS, Inc., Spec-trum by Hewlett-Packard, and 960 by Intel. However, the simpler instructionsets also increased the pressure for increased memory bandwidth for instruc-tion fetch. Pipelining allowed the CPU clock frequency to improve much fasterthan that of the memory system. As a result, the memory system began tobecome an important bottleneck in overall system performance. As a result,many researchers began to pay attention to memory accesses/references, vir-tual addresses, memory systems, main memory, and memory hierarchies.As the number of transistors further increased over time following the Moore’sLaw, there was also a resurgence of interest in the cache design. Although cacheswere used extensively in mainframe computers and minicomputers in the 1970s,the microprocessors started to have barely suﬃcient number of transistors inthe 1980s to incorporate caches on chip to mitigate the memory access bottle-neck. This was reﬂected in the increased ISCA coverage of topics such as cachememories, instruction caches, cache hierarchy, block sizes, cache misses, andtrace-driven simulation for cache performance studies10a.From 1986 to 1990, researchers also published intensively on architecturesthat supported Prolog, a programming language for rule-based inferences inartiﬁcial intelligence applications, as shown in Figure 9b. This was mostlystimulated by the increased DARPA funding in the early 1980s for AI architec-ture projects in response to the Japanese 5th generation AI computers project.However, the interest in AI and Prolog would soon diminish due to the lack ofpractical AI applications. 10 a) Word cloud visualization of topics from 1986-90(b) Top topics 1986-90 (c) Topics with most changes in cov-erage from 1986-90 to 1991-95

Figure 9: Top topics and trends, 1986-199011 .3 ISCA in the 1990s

In the early 1990s, the computer architecture researchers continue to publish ex-tensively on shared memory and shared memory multiprocessors . The increasingcommercial use of database applications and scientiﬁc applications in the early1990s further stimulated studies on shared memory server designs and message-passing clusters. Both trends eventually disrupted mainframes in the databasemarket and the traditional vector supercomputers in the scientiﬁc computingmarket. However, it is also interesting to note that the coverage of these topicsin ISCA will decrease dramatically in the next 5-year period.Thanks to the relentless scaling according to the Moore’s Law, the the num-ber of transistors available to the industry design teams increased to the levelthat allowed the designers to adopt techniques that had been only used in main-frame computers and supercomputers. Research in the 1980s set the foundationfor exception handling in processors using out of order and speculative executiontechniques. During the 1990-95 period, as shown in Figure 10, computer archi-tects used newly available transistors in building high-performance processors with hardware schedulers for detecting instruction-level parallelism , performing instruction re-ordering , branch predictions and speculative execution togetherwith pipelining for increased clock frequency and also to bridge the gap betweenmemory latency and processing time. These innovations resulted in deeperpipelines and wider issue width in a generation of superscalar processors.The industry design of superscalar processors reached its peak during the late1990s as Intel, AMD, MIPS/SGI, SUN, IBM, Hewlett-Packard all came up withsuperscalar microprocessor products in the mid 1990s. As shown in Figure 11, high-performance processor design techniques such as hardware speculation , and branch prediction received signiﬁcant attention from the ISCA community dur-ing 1996-2000. The superscalar processors introduced at that time includedIntel Pentium, MIPS R8000 and IBM Power series.However, the increased clock frequency and execution throughput of theseprocessors placed even more pressure on the memory system , which motivatedmore studies on memory access , data caches , cache sizes , block sizes , and missrates/ratios . During this time, the computer architecture community startedto converge on using trace driven simulation based on SPEC benchmarks instudying processor pipelines as well as cache memories .It is interesting to note though that, as shown in Figure 11(c), the coverageof instruction-level parallelism , branch prediction , superscalar processor sharedmemory , and memory systems would drop signiﬁcantly in the next 5-year pe-riod.The decade from 1991 to 2000 was a dark age for research in massively paral-lel computing systems and special-purpose acceleration hardware. The industrynot only rode on the Moore’s Law with the exponentially increasing number oftransistors, but also started to deviate from the Dennard’s scaling, trading morepower consumption for super-linear performance improvement over time. Thisstrategy resulted in such fast advancement in microprocessor performance that iteclipsed any beneﬁt of massively parallel computing systems or special-purposehardware accelerators. Thus, the term ”Killer Micros” became prominent, indi-cating the fact that the fast advancing microprocessor performance killed oﬀ theresearch and development of massively parallel computing and special-purposehardware acceleration. But all of this would change in the next decade as we’ll12 a) Word cloud visualization of topics - 1991-95(b) Top Topics 1991-1995 (c) Topics with the most change in cov-erage from 1991-95 to 1996-2000 Figure 10: Top topics and trends, 1991-9513 a) Word cloud visualization of topics - 1996-2000(b) Top topics 1996-2000 (c) Topics with the most change incoverage from 1996-2000 to 2001-05

Figure 11: Top topics and trends, 1996-200014iscuss below.

The fact that

SPEC became the top topic phrase for the period of 2001-05indicated that by this time the computer architecture community has fully em-braced quantitative approaches to computer architecture research. We observethat

SPEC CPU2000 , or its predecessor SPEC CPU95, has became the defacto standard for measuring processor and/or memory-hierarchy performancein 2000s. The benchmarks were used to measure a wide variety of system de-signs, including multiprocessing systems with a multi-level memory hierarchy , memory address translation overheads in cloud and server applications, a periodwhen internet was coincidentally getting popular.The period of early 2000s also saw the peak of speculation techniques insuperscalar processors, VLIW processors, and memory systems. The industrywas producing VLIW/EPIC processors such as the Intel Itanium. Computer ar-chitecture researchers were publishing extensively on architectural support forcompile-time control speculation and data speculation . Researchers also pub-lished extensively on register ﬁle design for both VLIW/EPIC and wide-issuesuperscalar processors. These processors require very large number ports tosupport simultaneous accesses to the register ﬁle by many instructions at diﬀer-ent stages of the processor pipeline. This triggered the coverage of large registerﬁles with multiple read and write ports. A number of ISCA publications at thattime looked at various aspects of register ﬁle design, including their organization,access time, power consumption and cost.From 1995 to 2005, the industry has been achieving super-linear scaling of high performance designs, especially the clock frequency, over time at the cost ofincreasing power consumption. As we mentioned earlier, this is accomplished bydeviating from the Dennard scaling principle of linearly scaling the performancein each generation of technology while keeping power consumption constant. By2005, the power consumption of microprocessors has reached the limit of practi-cal heat dissipation mechanisms. As a result, computer architecture researchersbegan to focus on energy eﬃciency , which would be one of the highly rankedISCA topics with the most increased coverage from 2001-05 to 2006-10, as shownin Figure 12(c). On the other hand, the coverage of superscalar processors andregister ﬁles would drop signiﬁcantly in the next 5-year period. It is interest-ing to note that Figure 12(c) presents one of the most dramatic shift of topiccoverage throughout the ISCA history.The period of 2006-2010 is the start of the era of chip multiprocessors , a.k.a. multicore processors . Before the availability of commercial multicore processors,superscalar processors were packed with more and more functional units . Theinstructions were dispatched to the available functional units. However, this wayof scaling the performance hit a practical barrier as the industry design teamsstruggled to exploit suﬃcient instruction-level parallelism to productively utilizeadditional functional units for increased performance. The industry made amajor pivot from uni-processor clock frequency and instruction-level parallelismscaling to multicore scaling around 2003. The clock frequencies and instruction-level parallelism of each CPU core will largely remain the same, whereas thenumber of cores would increase over time. In fact, in some designs, the clockfrequency may even be reduced to reduce power consumption to accommodate15 a) Word cloud visualization of topics 2001-05(b) Top topics 2001-05 (c) Topics with the most change in cov-erage from 2001-05 to 2006-10 Figure 12: Top topics and trends, 2001-200516 a) Word cloud visualization of topics - 2006-2010(b) Top topics - 2006-2010 (c) Topics with the most change incoverage from 2006-10 to 2011-15

Figure 13: Top topics and trends, 2006-1017ore cores for a given power budget. This turn away from clock frequency andinstruction-level processing was reﬂected in the reduced coverage of topics like superscalar processor and register ﬁle from 2001-05 to 2006-10, as shown inFigure 12(c).IBM launched Power4 that was the ﬁrst dual core processor in 2001. Com-paq developed piranha system for high-performance servers by integrating eightsimple Alpha processor cores along with a two-level cache hierarchy onto a sin-gle chip. Sun Microsystems launched Niagra in 2005 that was eight-core Webserver CPU. On-chip multiprocessing systems were thus studied in detail bythe ISCA community in varying contexts, including cache hierarchies, powerconsumption, communication overheads, thread to core assignment for Simul-taneous Multithreading (SMT), soft errors under low voltage, cache coherenceprotocols, interconnect networks, QoS, task scheduling and power management.The strong interests from the ISCA research community to support this move-ment were reﬂected in the high coverage of topics such as chip multiprocessors , multicore processors , power consumption , and energy eﬃciency during the pe-riod of 2006-10, as shown in Figure 13. It is also interesting to note that theterm chip multiprocessor gave way to multicore processor , which was reﬂectedin the drop of their coverage from 2006-10 to 2011-15 as shown in Figure 13(c). During the 2011-15 period, as shown in Figure 14, power consumption and energy eﬃciency became the top topics for ISCA authors. Meanwhile, advancesin the mobile devices and internet technology fueled an exponential growth inthe tech industry with a variety of applications that generated huge amount ofdata. GPUs with hundreds of processing cores, already in use by the industry forgraphics processing, proved to be high throughput devices for processing largeamount of data. They became more general purpose with the introduction ofCUDA programming model in 2007. The GPU equivalent of CPU cores, orStreaming Multiprocessors, run at about half of the clock frequency as CPUcores to achieve higher energy eﬃciency. The savings in the power consumptionenabled the GPU designers to provision much higher memory bandwidth andthread-level parallelism. A major challenge was to program these massivelyparallel processors.A movement to empower application developers to develop parallel applica-tions started in 2007 with libraries and education materials from NVIDIA andacademic institutions such as the University of Illinois at Urbana-Champaign,the University of California, Davis and the University of Tennessee, Knoxville.By 2011, there had been strong momentum in GPU libraries and applicationsin High-Performance computing. During the period of 2011-15, China, US, andJapan began to build top supercomputers based on CUDA GPUs. Exampleswere Tienhe 1 in China, Tsubame at Toyo Tech, Titan in Oakridge NationalLab, and Blue Waters at the University of Illinois at Urbana-Champaign.These powerful GPU solutions and the CUDA programming model supportalso paved the way for machine learning. Using CUDA GPUs, a team from theUniversity of Toronto trained the AlexNet using 1.2 million images and won theImageNet competition in 2012 with a large margin against the team in the 2ndplace. This victory ignited wide interests in Neural Networks for computer visionand other cognitive applications. Nvidia used it to capitalize its investment and18 a) Word cloud visualization of topics, 2011-2015(b) Top topics, 2011-2015 (c) Topics with the most change incoverage from 2011-15 to 2016-18

Figure 14: Top Topics and Trends 2011-201519 a) Visualization of Phrases from 2016-2018(b) Top topics, 2016-2018

Figure 15: Top topics and trends, 2016-18quickly developed cuDNN. Several other ﬁelds such as personalized medicine,genomics, physics, economics also realized that GPUs may help to consume lotsof existing data for scientiﬁc breakthroughs. Some of the Top10 supercomputersalso got equipped with GPUs. However, power consumption for large computingclusters was still a bottleneck. In 2011-15, we saw that power consumption hasbeen the biggest concern, and the number of publications in ISCA increasedto address this challenge in the context of data centers. Research ideas andsolutions were proposed by the ISCA community from industry and academiaat diﬀerent layers of computing stack, circuits, architecture and algorithms.In the most recent period of 2016-18, we saw that machine learning based onNeural Networks have made their way to many applications and have becomepart of real-life systems. In Figure15a, we see such inﬂuence in the ISCA pub-lication trends. Computer architects acted quickly and addressed the relatedchallenges to build machines that can process cognitive workﬂows in an eﬃcient20anner. Naturally, because of the increasing trend in machine learning andNeural Networks, GPUs have thus gained tremendous attention from the ISCAcommunity. GPUs have made the training and learning more practical in termsof time and energy consumption. The desire to train more models has moti-vated the development of specialized hardware for tensor processing, referred toas TPUs and Tensor Cores.However, processing large amount of data with GPUs and tensor processinghardware is eﬃcient only when there is enough reuse of data primarily becauseof the memory wall. For applications with random data access pattern and poorcache hit rate, GPUs are not suitable. Even for applications with regular accesspatterns, what we have witnessed so far is under utilized GPUs because of notenough reuse in the applications. As data is growing exponentially, we shouldalso expect an exponential growth in storage requirement, data movement andenergy consumption. All these concerns are clearly visible in the ISCA publica-tion trends during 2016-18. Performance from any of the existing technologieshas not shown to be scalable to process this large amount of data within therequired power and throughput budget. Radical changes are required from topto bottom both in software and hardware.

One of the key questions to the computer architecture community in the comingdecade is how computer architects will address challenges to scale the perfor-mance by 100x while staying in the required power and cost budgets? For storingthe large amount of data to be processed, we expect there may be a departurefrom existing storage devices to relatively lower cost and lower latency storagesolutions. SSD is already used widely in consumer electronics and is now makingits way into high performance computing solutions. Another trend is for mainmemory to be extended through 3D integration. Today top-of-the-line NVIDIAGPUs have already equipped with high bandwidth stacked DRAM solutions.In order to restrict the amount of data movement, an upcoming trend can beobserved towards in-memory-computing and near-memory-computing solutions.In a few years, we expect to witness a compute hierarchy parallel to theexisting memory hierarchy. However, a million-dollar question would be howto address the complexity of compute logic at diﬀerent levels of the computehierarchy given a variety of applications. One of the solutions along this lineis to put logic layer underneath a stacked DRAM, which has captured a lotof attention from industry and academia. However, what compute logic goeswhere in the compute hierarchy is still an open research question. We will alsolikely witness parallelism by having a massive number of simple, energy eﬃcientprocessing cores running at lower clock frequency, distributed not only across allthe levels of memory hierarchy but at multiple levels in a memory die to reachbandwidth at the scale of TB/s. One fundamental idea is to restrict the datamovement to local compute units if possible, increase parallelism, lower powerconsumption, and achieve high bandwidth and high throughput.We will continue to see accelerators working along with general purpose pro-cessors. In the end, the whole system would consist of heterogeneous computing21ores interconnected via an interconnection fabric. One of the daunting chal-lenges is to program such devices, which requires researchers rethinking of thewhole software stack to ease programmers’ life. Since late 1970s, we have beendeveloping high-level languages. We think that computer scientists are now go-ing to have another iteration of the design cycle. We will see the developmentof even higher-level languages including domain speciﬁc languages (DSLs) andnew compilers that would support a variety of those domain speciﬁc languagesand hardware platforms using a hierarchy of intermediate representations. Inthe next few years, we will also likely see an increasing interest in using ma-chine learning as a tool to depart from the paradigm of hand crafted rules tosemi-automation. It may help compilers, schedulers, branch predictors and anyother computing system operations to learn from application behavior runningin the cloud or at the edge. However, beneﬁts of such a paradigm shift are stillto be investigated.

This study began with a research project, called DISCVR, conducted at theIBM-ILLINOIS Center for Cognitive Computing Systems Reseach (c3sr.com).The goal of DISCVR was to build a practical NLP based AI solution pipelineto process large amounts of PDF documents and evaluate the end-to-end NLPcapabilities in understanding large amount of unstructured data such as PDFﬁles. This will in turn help us better understand the computation patternsand requirements of modern AI solutions on the underlying computing systems.While building such a prototye, an early use case came to us thanks to the2017 IEEE/ACM International Symposium on Microarchitecture (MICRO-50)Program Co-chairs, Drs. Hillery Hunter and Jaime Moreno. They asked us ifwe can perform some data-driven analysis of the past 50 years MICRO papersand show some interesting historical perspectives on MICRO’s 50 years of pub-lication. Because of the limited amount of time, we were only able to producesome preliminary results and delivered an invited talk during that year’s MI-CRO opening reception. It generated some interesting discussions, but the lackof insights from those early results has provided limited use of that work. Thatundertake has, however, planted a seed in our C3SR center. We learned twoimportant lessons from that experience: (1) building an AI solution to trulyunderstand unstructured data is hard in spite of the many claimed successes innatural language understanding; and (2) providing a data-driven perspective oncomputer architecture research is a very interesting and fun project.Since then, we continued to push those two frontiers of research at the C3SRcenter. On the ﬁrst topic, we built a prototype of paper review matching system,called C3SR System for Reviewer Assignment (CSRA), and used that system tohelp the Program Co-chairs of 2019 ISCA, Drs. Hillery Hunter and Erik Altman,with their paper review assignment task. On the second topic, we decided toconduct a more thorough study based on all past ISCA papers, which resultedthis article.We recognize that we have just scratched the surface of natural language un-derstanding of unstructured data, and there are many more aspects that we canimprove (and we are still working on them). But even with our current study,we felt there were enough interesting ﬁndings that may be worthwhile to share22ith the community. Hence we decided to write this article to summarize ourﬁndings so far based only on ISCA publications. Our hope is to generate furtherinterests from the community in this topic, and we welcome collaboration fromthe community to deepen our understanding both of the computer architectureresearch and of the challenges of NLP-based AI solutions. For example, a simi-lar study can be conducted for other conferences (such as MICRO) and in otherresearch areas (such as CVPR and SIGKDD).

We would like to thank Mr. Abdul Dakkak and Ms. Cheng Li from the C3SRcenter for their early work on DISCVR and their contribution to the MICRO-50data analysis. We would also like to thank Mr. Jong Yoon Lee, Ms. HongyuGong and Ms. Renee Zhu from the C3SR center for their contributions to theCSRA project, which had direct impact on the data analysis as used in thisarticle. We would also like to thank Drs. Hillery Hunter, Jaime Moreno andErik Altman from IBM Research for their encouragement and feedback for ourwork.