Jongsoo Park
Stanford University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jongsoo Park.
IEEE Computer | 2008
William J. Dally; James D. Balfour; D. Black-Shaffer; J. Chen; R.C. Harting; Vishal Parikh; Jongsoo Park; D. Sheffield
Hardwired ASICs - 50X more efficient than programmable processors - sacrifice programmability to meet the efficiency requirements of demanding embedded systems. Programmable processors use energy mostly to supply instructions and data to the arithmetic units, and several techniques can reduce instruction- and data-supply energy costs. Using these techniques in the Stanford ELM processor closes the gap with ASICs to within 3X.
IEEE Computer Architecture Letters | 2008
James D. Balfour; William J. Dally; David Black-Schaffer; Vishal Parikh; Jongsoo Park
We present an efficient programmable architecture for compute-intensive embedded applications. The processor architecture uses instruction registers to reduce the cost of delivering instructions, and a hierarchical and distributed data register organization to deliver data. Instruction registers capture instruction reuse and locality in inexpensive storage structures that arc located near to the functional units. The data register organization captures reuse and locality in different levels of the hierarchy to reduce the cost of delivering data. Exposed communication resources eliminate pipeline registers and control logic, and allow the compiler to schedule efficient instruction and data movement. The architecture keeps a significant fraction of instruction and data bandwidth local to the functional units, which reduces the cost of supplying instructions and data to large numbers of functional units. This architecture achieves an energy efficiency that is 23x greater than an embedded RISC processor.
Neurosurgery | 2011
Robert T. Arrigo; Paul Kalanithi; Ivan Cheng; Todd Alamin; Eugene J. Carragee; Stefan A. Mindea; Jongsoo Park; Maxwell Boakye
BACKGROUND:Surgery for spinal metastasis is a palliative treatment aimed at improving patient quality of life by alleviating pain and reversing or delaying neurologic dysfunction, but with a mean survival time of less than 1 year and significant complication rates, appropriate patient selection is crucial. OBJECTIVE:To identify the most significant prognostic variables of survival after surgery for spinal metastasis. METHODS:Chart review was performed on 200 surgically treated spinal metastasis patients at Stanford Hospital between 1999 and 2009. Survival analysis was performed and variables entered into a Cox proportional hazards model to determine their significance. RESULTS:Median overall survival was 8.0 months, with a 30-day mortality rate of 3.0% and a 30-day complication rate of 34.0%. A Cox proportional hazards model showed radiosensitivity of the tumor (hazard ratio: 2.557, P < .001), preoperative ambulatory status (hazard ratio: 2.355, P = .0001), and Charlson Comorbidity Index (hazard ratio: 2.955, P < .01) to be significant predictors of survival. Breast cancer had the best prognosis (median survival, 27.1 months), whereas gastrointestinal tumors had the worst (median survival, 2.66 months). CONCLUSION:We identified the Charlson Comorbidity Index score as one of the strongest predictors of survival after surgery for spinal metastasis. We confirmed previous findings that radiosensitivity of the tumor and ambulatory status are significant predictors of survival.
acm symposium on parallel algorithms and architectures | 2010
Jongsoo Park; William J. Dally
We present a scheduling algorithm of stream programs for multi-core architectures called team scheduling. Compared to previous multi-core stream scheduling algorithms, team scheduling achieves 1) similar synchronization overhead, 2) coverage of a larger class of applications, 3) better control over buffer space, 4) deadlock-free feedback loops, and 5)lower latency. We compare team scheduling to the latest stream scheduling algorithm, sgms, by evaluating 14 applications on a multi-core architecture with 16 cores. Team scheduling successfully targets applications that cannot be validly scheduled by sgms due to excessive buffer requirement or deadlocks in feedback loops (e.g., gsm and w-cdma). For applications that can be validly scheduled by sgms, team scheduling shows on average 37% higher throughput within the same buffer space constraints.
design, automation, and test in europe | 2007
Jongsoo Park; Sung-Boem Park; James D. Balfour; David Black-Schaffer; Christos Kozyrakis; William J. Dally
Conventional register file architectures cannot optimally exploit temporal locality in data references due to their limited capacity and static encoding of register addresses in instructions. In conventional embedded architectures, the register file capacity cannot be increased without resorting to longer instruction words. Similarly, loop unrolling is often required to exploit locality in the register file accesses across iterations because naming registers statically is inflexible. Both optimizations lead to significant code size increases, which is undesirable in embedded systems. In this paper, the authors introduce the register pointer architecture (RPA), which allows registers to be accessed indirectly through register pointers. Indirection allows a larger register file to be used without increasing the length of instruction words. Additional register file capacity allows many loads and stores, such as those introduced by spill code, to be eliminated, which improves performance and reduces energy consumption. Moreover, indirection affords additional flexibility in naming registers, which reduces the need to apply loop unrolling in order to maximize reuse of register allocated variables
IEEE Computer Architecture Letters | 2008
David Black-Schaffer; James D. Balfour; William J. Dally; Vishal Parikh; Jongsoo Park
This paper analyzes a range of architectures for efficient delivery of VLIW instructions for embedded media kernels. The analysis takes an efficient filter cache as a baseline and examines the benefits from 1) removing the tag overhead, 2) distributing the storage, 3) adding indirection, 4) adding efficient NOP generation, and 5) sharing instruction memory. The result is a hierarchical instruction register organization that provides a 56% energy and 40% area savings over an already efficient filter cache.
compilers, architecture, and synthesis for embedded systems | 2010
Jongsoo Park; James D. Balfour; William J. Dally
We present a fine-grain dynamic instruction placement algorithm for small L0 scratch-pad memories (SPMs), whose unit of transfer can be an individual instruction. Our algorithm captures a large fraction of instruction reuse missed by coarse-grain placement algorithms whose unit of transfer is restricted to loops or functions within the capacity of SPMs. Evaluation of L0 SPMs with our fine-grain algorithm in 17 applications shows that the energy consumed by instruction storage hierarchy is reduced by 38% and 31% compared to that of L0 instruction caches and L0 SPMs with an ideal coarse-grain algorithm, respectively.
Archive | 2008
Jongsoo Park; Tzishing Jesse Lim
Archive | 2011
Jongsoo Park; William J. Dally
Neurology | 2014
Leslie Lee; S. Cho; Viet Nguyen; John K. Ratliff; Jongsoo Park; Jaime R. Lopez