Fengbo Ren
Arizona State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fengbo Ren.
ACS Nano | 2014
Junpeng Li; Jiajie Liang; Lu Li; Fengbo Ren; Wei Hu; Juan Li; Shuhua Qi; Qibing Pei
A healable transparent capacitive touch screen sensor has been fabricated based on a healable silver nanowire-polymer composite electrode. The composite electrode features a layer of silver nanowire percolation network embedded into the surface layer of a polymer substrate comprising an ultrathin soldering polymer layer to confine the nanowires to the surface of a healable Diels-Alder cycloaddition copolymer and to attain low contact resistance between the nanowires. The composite electrode has a figure-of-merit sheet resistance of 18 Ω/sq with 80% transmittance at 550 nm. A surface crack cut on the conductive surface with 18 Ω is healed by heating at 100 °C, and the sheet resistance recovers to 21 Ω in 6 min. A healable touch screen sensor with an array of 8×8 capacitive sensing points is prepared by stacking two composite films patterned with 8 rows and 8 columns of coupling electrodes at 90° angle. After deliberate damage, the coupling electrodes recover touch sensing function upon heating at 80 °C for 30 s. A capacitive touch screen based on Arduino is demonstrated capable of performing quick recovery from malfunction caused by a razor blade cutting. After four cycles of cutting and healing, the sensor array remains functional.
IEEE Transactions on Electron Devices | 2010
Fengbo Ren; Dejan Markovic
The use of spin-transfer torque (STT) devices for memory design has been a subject of research since the discovery of the STT on MgO-based magnetic tunnel junctions (MTJs). Recently, MTJ-based computing architectures such as logic-in-memory have been proposed and claim superior energy-delay performance over static CMOS. In this paper, we conduct exhaustive energy-performance analysis of an STT-MTJ-based logic-in-memory (LIM-MTJ) 1-bit full adder and compare it with its corresponding CMOS counterpart. Our results show that the LIM-MTJ circuit has no advantage in energy-performance over its equivalent CMOS designs. We also show that the MTJ-based logic circuit requiring frequent MTJ switching during the operation is hardly power efficient.
field programmable gate arrays | 2014
Richard Dorrance; Fengbo Ren; Dejan Markovic
Sparse Matrix-Vector Multiplication (SpMxV) is a widely used mathematical operation in many high-performance scientific and engineering applications. In recent years, tuned software libraries for multi-core microprocessors (CPUs) and graphics processing units (GPUs) have become the status quo for computing SpMxV. However, the computational throughput of these libraries for sparse matrices tends to be significantly lower than that of dense matrices, mostly due to the fact that the compression formats required to efficiently store sparse matrices mismatches traditional computing architectures. This paper describes an FPGA-based SpMxV kernel that is scalable to efficiently utilize the available memory bandwidth and computing resources. Benchmarking on a Virtex-5 SX95T FPGA demonstrates an average computational efficiency of 91.85%. The kernel achieves a peak computational efficiency of 99.8%, a >50x improvement over two Intel Core i7 processors (i7-2600 and i7-4770) and showing a >300x improvement over two NVIDA GPUs (GTX 660 and GTX Titan), when running the MKL and cuSPARSE sparse-BLAS libraries, respectively. In addition, the SpMxV FPGA kernel is able to achieve higher performance than its CPU and GPU counterparts, while using only 64 single-precision processing elements, with an overall 38-50x improvement in energy efficiency.
international symposium on quality electronic design | 2012
Fengbo Ren; Henry Park; Richard Dorrance; Yuta Toriyama; C.-K. Ken Yang; Dejan Markovic
With scaling of CMOS and Magnetic Tunnel Junction (MTJ) devices, conventional low-current reading techniques for STT-RAMs face challenges in achieving reliability and performance improvements that are expected from scaled devices. The challenges arise from the increasing variability of the CMOS sensing current and the reduction in MTJ switching current. This paper proposes a short-pulse reading circuit, based on a body-voltage sensing scheme to mitigate the scaling issues. Compared to existing sensing techniques, our technique shows substantially higher read margin (RM) despite a much shorter sensing time. A narrow current pulse applied to an MTJ significantly reduces the probability of read disturbance. The RM analysis is validated by Monte-Carlo simulations in a 65-nm CMOS technology with both CMOS and MTJ variations considered. Simulation results show that our technique is able to provide over 300 mV RM at a GHz frequency across process-voltage-temperature (PVT) variations, while the reference designs require 4.3 ns and 2.3 ns sensing time for a 200 mV RM, respectively. The effective read energy per bit required by the proposed sensing circuit is around 195 ft in the nominal case.
field-programmable logic and applications | 2013
Fengbo Ren; Richard Dorrace; Wenyao Xu; Dejan Markovic
Compressive sensing (CS) is a promising technology for the low-power and cost-effective data acquisition in wireless healthcare systems. However, its efficient realtime signal reconstruction is still challenging, and there is a clear demand for hardware acceleration. In this paper, we present the first single-precision floating-point CS reconstruction engine implemented a Kintex-7 FPGA using the orthogonal matching pursuit (OMP) algorithm. In order to achieve high performance with maximum hardware utilization, we propose a highly parallel architecture that shares the computing resources among different tasks of OMP by using configurable processing elements (PEs). By fully utilizing the FPGA recourses, our implementation has 128 PEs in parallel and operates at 53.7 MHz. In addition, it can support 2x larger problem size and 10x more sparse coefficients than prior work, which enables higher reconstruction accuracy by adding finer details to the recovered signal. Hardware results from the ECG reconstruction tests show the same level of accuracy as the double-precision C program. Compared to the execution time of a 2.27 GHz CPU, the FPGA reconstruction achieves an average speed-up of 41x.
IEEE Transactions on Circuits and Systems | 2013
Fengbo Ren; Henry Park; Chih-Kong Ken Yang; Dejan Markovic
With the continuing scaling of MTJ, the high-speed reading of STT-RAM becomes increasingly difficult. Recently, a body-voltage sensing circuit (BVSC) has been proposed for boosting the sensing speed. This paper analyzes the effectiveness of using the reference calibration technique to compensate for the device mismatches and improve the read margin of BVSC. HSPICE simulation results show that a 2-bit reference calibration can improve the worst-case read margin in a 1-Mb memory by over 3 times. This leads to up to 30% higher yield across all process corners. In order to maintain the yield improvement even in the worst-case corner, independent calibration circuitry has to be deployed for each memory array.
international solid-state circuits conference | 2015
Fengbo Ren; Dejan Markovic
Compressive sensing (CS) is a promising solution for low-power on-body sensors for 24/7 wireless health monitoring [1]. In such an application, a mobile data aggregator performing real-time signal reconstruction is desired for timely prediction and proactive prevention. However, CS reconstruction requires solving a sparse approximation (SA) problem. Its high computational complexity makes software solvers, consuming 2-50W on CPUs, very energy inefficient for real-time processing. This paper presents a configurable SA engine in a 40nm CMOS technology for energy-efficient mobile data aggregation from compressively sampled biomedicai signals. Using configurable architecture, a 100% utilization of computing resources is achieved. An efficient data-shuffling scheme is implemented to reduce memory leakage by 40%. At the minimum-energy point (MEP), the SA engine achieves a real-time throughput for reconstructing 61-to-237 channels of biomedicai signals simultaneously with <;1% of a mobile devices 2W power budget, which is 76-350× more energy-efficient than prior hardware designs.
IEEE Journal of Solid-state Circuits | 2016
Fengbo Ren; Dejan Markovic
Compressive sensing (CS) is a promising technology for realizing low-power and cost-effective wireless sensor nodes (WSNs) in pervasive health systems for 24/7 health monitoring. Due to the high computational complexity (CC) of the reconstruction algorithms, software solutions cannot fulfill the energy efficiency needs for real-time processing. In this paper, we present a 12-237 kS/s 12.8 mW sparse-approximation (SA) engine chip that enables the energy-efficient data aggregation of compressively sampled physiological signals on mobile platforms. The SA engine chip integrated in 40 nm CMOS can support the simultaneous reconstruction of over 200 channels of physiological signals while consuming (1% of a smartphones power budget. Such energyefficient reconstruction enables two-to-three times energy saving at the sensor nodes in a CS-based health monitoring system as compared to traditional Nyquist-based systems, while providing timely feedback and bringing signal intelligence closer to the user.
international conference on acoustics, speech, and signal processing | 2016
Kai Xu; Yixing Li; Fengbo Ren
Wireless body area network (WBAN) is emerging in the mobile healthcare area to replace the traditional wire-connected monitoring devices. As wireless data transmission dominates power cost of sensor nodes, it is beneficial to reduce the data size without much information loss. Compressive sensing (CS) is a perfect candidate to achieve this goal compared to existing compression techniques. In this paper, we proposed a general framework that utilize CS and online dictionary learning (ODL) together. The learned dictionary carries individual characteristics of the original signal, under which the signal has an even sparser representation compared to pre-determined dictionaries. As a consequence, the compression ratio is effectively improved by 2-4× comparing to prior works. Besides, the proposed framework offloads pre-processing from sensor nodes to the server node prior to dictionary learning, providing further reduction in hardware costs. As it is data driven, the proposed framework has the potential to be used with a wide range of physiological signals.
IEEE Embedded Systems Letters | 2014
Fengbo Ren; Chenxin Zhang; Liang Liu; Wenyao Xu; Viktor Öwall; Dejan Markovic
QR decomposition (QRD) is used to solve least-squares (LS) problems for a wide range of applications. However, traditional QR decomposition methods, such as Gram-Schmidt (GS), require high computational complexity and nonlinear operations to achieve high throughput, limiting their usage on resource-limited platforms. To enable efficient LS computation on embedded systems for real-time applications, this paper presents an alternative decomposition method, called QDRD, which relaxes system requirements while maintaining the same level of performance. Specifically, QDRD eliminates both the square-root operations in the normalization step and the divisions in the subsequent backward substitution. Simulation results show that the accuracy and reliability of factorization matrices can be significantly improved by QDRD, especially when executed on precision-limited platforms. Furthermore, benchmarking results on an embedded platform show that QDRD provides constantly better energy-efficiency and higher throughput than GS-QRD in solving LS problems. Up to 4 and 6.5 times improvement in energy-efficiency and throughput, respectively, can be achieved for small-size problems.