Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where P D Sai Manoj is active.

Publication


Featured researches published by P D Sai Manoj.


ieee international d systems integration conference | 2013

High-speed and low-power 2.5D I/O circuits for memory-logic-integration by through-silicon interposer

Jiacheng Wang; Shunli Ma; P D Sai Manoj; Mingbin Yu; Roshan Weerasekera; Hao Yu

In this paper, two high-speed and low-power I/O circuits are developed using through-silicon-interposer (TSI) for 2.5D integration of multi-core processor and memory in 65 nm CMOS process. For a 3 mm TSI interconnection of transmission line (T-line), the first I/O circuit is a low-voltage-differential-signal (LVDS) buffer and the second one is a current-mode-logic (CML) buffer. To compensate the high-frequency loss from T-line, a pre-emphasis circuit is deployed in the LVDS buffer, and a wide-band inductor-matching is deployed in the CML buffer. Based on the post layout simulation results, the LVDS buffer can achieve 360 mV peak-to-peak differential output signal swing and 563 fs cycle-to-cycle jitter with 10 Gb/s bandwidth and 4.8 mW power consumption. The CML buffer can achieve 240 mV peak-to-peak differential output signal swing and 453 fs jitter with 12.8 Gb/s data-rate and 1.6 mA current consumption under 0.6 V ultra low-power supply.


IEEE Transactions on Computers | 2015

3D Many-Core Microprocessor Power Management by Space-Time Multiplexing Based Demand-Supply Matching

P D Sai Manoj; Hao Yu; Kanwen Wang

A reconfigurable power switch network is proposed to perform a demand-supply matched power management between 3D-integrated microprocessor cores and power converters. The power switch network makes physical connections between cores and converters by 3D through-silicon-vias (TSVs). Space-time multiplexing is achieved by the configuration of power switch network and is realized by learning and classifying power-signature of workloads. As such, by classifying workloads based on magnitude and phase of power-signature, space-time multiplexing can be performed with the minimum number of converters allocated to cluster of cores. Furthermore, a demand-response based workload scheduling is performed to reduce peak-power and to balance workload. The proposed power management is verified by system models with physical design parameters and benched power traces of workloads. For a 64-core case, experiment results show 40.53 percent peak-power reduction and 2.50x balanced workload along with a 42.86 percent reduction in the required number of power converters compared to the work without using STM based power management.


IEEE Design & Test of Computers | 2015

A 2.5-D Memory-Logic Integration With Data-Pattern-Aware Memory Controller

Dongjun Xu; P D Sai Manoj; Kanwen Wang; Hao Yu; Ningmei Yu; Mingbin Yu

This paper presents silicon interposer-based 2.5-D integration of core and memory chips. Utilization of the channels through TSVs and interposer routing between the core and memory chips is maximized by bandwidth balancing enabled by space-time multiplexing of the channels with core clustering.


design automation conference | 2013

Peak power reduction and workload balancing by space-time multiplexing based demand-supply matching for 3D thousand-core microprocessor

P D Sai Manoj; Kanwen Wang; Hao Yu

Space-time multiplexing is utilized for demand-supply matching between many-core microprocessors and power converters. Adaptive clustering is developed to classify cores by similar power level in space and similar power behavior in time. In each power management cycle, minimum number of power converters are allocated for space-time multiplexed matching, which is physically enabled by 3D through-silicon-vias. Moreover, demand-response based task adjustment is applied to reduce peak power and to balance workload. The proposed power management system is verified by system models with physical design parameters and benched power traces, which show 38.10% peak power reduction and 2.60x balanced workload.


custom integrated circuits conference | 2015

A scalable and reconfigurable 2.5D integrated multicore processor on silicon interposer

Jie Lin; Shikai Zhu; Zhiyi Yu; Dongjun Xu; P D Sai Manoj; Hao Yu

This paper presents a novel 2.5D multicore processor which consists of 3 distinct silicon dies: a processor die with 8 MIPS-cores, a 16kB SRAM die, and an accelerator die for multimedia and communication applications. These dies are interconnected into multi-modes, like core-core (up to 32 cores), core-memory (4x storage capacity) and core-accelerator (4.4x speedup in H.264 decoder), to establish a scalable and reconfigurable platform with less tape-out die area cost. A pair of 8Gbps SerDes is custom designed for each of the 12 inter-die communication channels, achieving a 2.5D I/O bandwidth of 24GB/s. The processor was implemented in GF 65nm process, and operates at 500MHz under 1.2V supply, with 1.08W power dissipation.


IEEE Transactions on Computers | 2016

A Q-Learning Based Self-Adaptive I/O Communication for 2.5D Integrated Many-Core Microprocessor and Memory

P D Sai Manoj; Hao Yu; Hantao Huang; Dongjun Xu

A self-adaptive output-voltage swing adjustment is introduced in the design of energy-efficient I/O communication for 2.5D integrated many-core microprocessor and memory. Instead of transmitting signal with large voltage swing, a Q-learning based I/O management is deployed to adaptively adjust the I/O output-voltage swing under constraints of both communication power and bit error rate (BER). Simulation results show that the proposed adaptive 2.5D I/Os (in 65 nm CMOS) can achieve an average of 12.5 mW I/O power, 4 GHz bandwidth and 3.125 pJ/bit energy efficiency for one channel under 10-6 BER. With the use of conventional Q-learning and further accelerated Q-learning, we can achieve 12.95 and 18.89 percent power reduction and 14 and 15.11 percent energy efficiency improvement when compared to the use of uniform output-voltage swing based I/O communication.


IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2016

A Zonotoped Macromodeling for Eye-Diagram Verification of High-Speed I/O Links With Jitter and Parameter Variations

Leibin Ni; P D Sai Manoj; Yang Song; Chenjie Gu; Hao Yu

It is challenging to efficiently evaluate the performance bound of high-precision analog circuits with input and parameter variations at nano-scale. With the use of zonotope to model uncertainty of input data pattern (or jitter) and multiple parameters, a reachability-based verification is developed in this paper to compute the worst-case eye-diagram. The proposed zonotope-based reachability analysis can consider both spatial and temporal variations in one-time simulation. Moreover, a nonlinear zonotoped macromodeling is further developed to reduce the computational complexity. Performance bound for I/O links considering the parameter variations are evaluated. In addition, the eye-diagrams are generated by the proposed zonotoped macromodel for performance evaluation considering both temporal and spatial variations. As shown by experiments, the zonotoped macromodel achieves up to 450× speedup compared to the Monte Carlo simulation of the original model within small error under specified macromodel order for high-speed I/O links eye-diagram verification.


international symposium on low power electronics and design | 2014

An energy-efficient 2.5D through-silicon interposer I/O with self-adaptive adjustment of output-voltage swing

Dongjun Xu; P D Sai Manoj; Hantao Huang; Ningmei Yu; Hao Yu

A self-adaptive output swing adjustment is introduced for the design of energy-efficient 2.5D through-silicon interposer (TSI) I/Os. Instead of transmitting signal with large voltage swing, Q-learning based self-adaptive adjustment is deployed to adjust I/O output-voltage swing under constraints of both power budget and bit error rate (BER). Experimental results show that the adaptive 2.5D TSI I/Os designed in 65nm CMOS can achieve an average of 13mW I/O power, 4GHz bandwidth and 3.25pJ=bit energy efficiency for one channel under 10-6 BER, which has 21.42% reduction of power and 14.47% energy efficiency improvement.


international conference on computer aided design | 2014

Reinforcement learning based self-adaptive voltage-swing adjustment of 2.5D I/Os for many-core microprocessor and memory communication

Huang Hantao; P D Sai Manoj; Dongjun Xu; Hao Yu; Zhigang Hao

A reinforcement learning based I/O management is developed for energy-efficient communication between many-core microprocessor and memory. Instead of transmitting data under a fixed large voltage-swing, an online reinforcement Q-learning algorithm is developed to perform a self-adaptive voltage-swing control of 2.5D through-silicon interposer (TSI) I/O circuits. Such a voltage-swing adjustment is formulated as a Markov decision process (MDP) problem solved by model-free reinforcement learning under constraints of both power budget and bit-error-rate (BER). Experimental results show that the adaptive 2.5D TSI I/Os designed in 65nm CMOS can achieve an average of 12.5mw I/O power, 4GHz bandwidth and 3.125pJ/bit energy efficiency for one channel under 10-6 BER, which has 18.89% power saving and 15.11% improvement of energy efficiency on average.


international symposium on circuits and systems | 2013

Cyber-physical management for heterogeneously integrated 3D thousand-core on-chip microprocessor

P D Sai Manoj; Hao Yu

Though 3D TSV/TSI technology provides the promising platform for heterogeneous system integration with design drivers ranged from thousand-core microprocessor to millimeter-cubic sensor, the fundamental challenge is lack of light to deal with significantly increased design complexity. From device level, new state of variables from different physical domains such as MEMS, microfluidic and NVM devices have to be identified and described together with conventional states from CMOS VLSI; and from system level, cyber management of states of voltage-level and temperature has to be maintained under a real-time demand response fashion. Moreover, a cyber-physical link is required to compress and virtualize device level state details during system level state control. This paper shows device-level 3D integration by example of MEMS and CMOS VLSI. In addition, a cyber-physical thermal management for 3D integrated many-core microprocessors is discussed.

Collaboration


Dive into the P D Sai Manoj's collaboration.

Top Co-Authors

Avatar

Hao Yu

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar

Dongjun Xu

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar

Kanwen Wang

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar

Hantao Huang

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Huang Hantao

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar

Jiacheng Wang

Nanyang Technological University

View shared research outputs
Researchain Logo
Decentralizing Knowledge