Hantao Huang
Nanyang Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hantao Huang.
design, automation, and test in europe | 2012
Chun Zhang; Wei Wu; Hantao Huang; Hao Yu
Real-time and decentralized energy resource allocation has become the main feature to develop for the next generation energy management system (EMS). In this paper, a minority game (MG)-based EMS (MG-EMS) is proposed for smart buildings with hybrid energy sources: main energy resource from electrical power-grid and renewable energy resource from solar photovoltaic (PV) cells. Compared to the traditional static and centralized EMS (SC-EMS), and the recent multi-agent-based EMS (MA-EMS) based on price-demand competition, our proposed MG-EMS can achieve up to 51× and 147× utilization rate improvements respectively regarding to the fairness of solar energy resource allocation. In addition, the proposed MG-EMS can also reduce peak energy demand for main power-grid by 30.6%. As such, one can significantly reduce the cost and improve the stability of micro-grid of smart buildings with a high utilization rate of solar energy.
design, automation, and test in europe | 2016
Hantao Huang; Yuehua Cai; Hao Yu
Indoor data analytics is one typical example of ambient intelligence with behaviour or feature extraction from environmental data. It can be utilized to help improve comfort level in building and room for occupants. To address dynamic ambient change in a large-scaled space, real-time and distributed data analytics is required on sensor (or gateway) network, which however has limited computing resources. This paper proposes a computationally efficient data analytics by distributed-neuron-network (DNN) based machine learning with application for indoor positioning. It is based on one incremental L2-norm based solver for learning collected WiFi-data at each gateway and is further fused for all gateways in the network to determine the location. Experimental results show that with multiple distributed gateways running in parallel, the proposed algorithm can achieve 50x and 38x speedup during data testing and training time respectively with comparable positioning accuracy, when compared to traditional support vector machine (SVM) method.
ACM Journal on Emerging Technologies in Computing Systems | 2017
Leibin Ni; Hantao Huang; Zichuan Liu; Rajiv V. Joshi; Hao Yu
Emerging resistive random-access memory (RRAM) can provide non-volatile memory storage but also intrinsic logic for matrix-vector multiplication, which is ideal for low-power and high-throughput data analytics accelerator performed in memory. However, the existing RRAM-based computing device is mainly assumed on a multi-level analog computing, whose result is sensitive to process non-uniformity as well as additional AD- conversion and I/O overhead. This paper explores the data analytics accelerator on binary RRAM-crossbar. Accordingly, one distributed in-memory computing architecture is proposed with design of according component and control protocol. Both memory array and logic accelerator can be implemented by RRAM-crossbar purely in binary, where logic-memory pairs can be distributed with protocol of control bus. Based on numerical results for fingerprint matching that is mapped on the proposed RRAM-crossbar, the proposed architecture has shown 2.86x faster speed, 154x better energy efficiency, and 100x smaller area when compared to the same design by CMOS-based ASIC.
international symposium on circuits and systems | 2016
Leibin Ni; Hantao Huang; Hao Yu
On-line machine learning has become the need for future data analytics. This work will show an ℓ2 norm based hardware solver for on-line machine learning that can significantly reduce training time when compared to the traditional gradient-based solution using backward propagation. We will show that the intensive matrix-vector multiplication in ℓ2 norm solution can be mapped onto a distributed in-memory accelerator using the recent resistive switching random access memory (RRAM) device. A digitized matrix-vector multiplication accelerator will be developed based on the distributed RRAM-crossbar. Such a distributed RRAM-crossbar architecture can utilize the reformulated ℓ2 norm solver with a scalable and energy-efficient solution for real-time training and testing in image recognition. Experiment results have shown that significant speedup can be achieved for matrix-vector multiplication in the ℓ2 norm solver such hat the overall training and testing time can be reduced respectively. In addition, large energy saving can be also achieved when compared to the traditional CMOS-based out-of-memory computing architecture.
Archive | 2017
Hantao Huang; Rai Suleman Khalid; Hao Yu
Computational intelligence techniques are intelligent computational methodologies such as neural network to solve real-world complex problems. One example is to design a smart agent to make decisions within environment in response to the presence of human beings. Smart building/home is a typical computational intelligence based system enriched with sensors to gather information and processors to analyze it. Indoor computational intelligence based agents can perform behavior or feature extraction from environmental data such as power, temperature, and lighting data, and hence further help improve comfort level for human occupants in building. The current indoor system cannot address dynamic ambient change with a real-time response under emergency because processing backend in cloud takes latency. Therefore, in this chapter we have introduced distributed machine learning algorithms (SVM and neural network ) mapped on smart-gateway networks. Scalability and robustness are considered to perform real-time data analytics . Furthermore, as the success of system depends on the trust of users, network intrusion detection for smart gateway has also been developed to provide system security. Experimental results have shown that with a distributed machine learning mapped on smart-gateway networks real-time data analytics can be performed to support sensitive, responsive and adaptive intelligent systems.
Archive | 2017
Hao Yu; Leibin Ni; Hantao Huang
The recent emerging memristor can provide non-volatile memory storage but also intrinsic computing for matrix-vector multiplication, which is ideal for low-power and high-throughput data analytics accelerator performed in memory. However, the existing memristor-crossbar based computing is mainly assumed as a multi-level analog computing, whose result is sensitive to process non-uniformity as well as additional overhead from AD-conversion and I/O. In this chapter, we explore the matrix-vector multiplication accelerator on a binary memristor-crossbar with adaptive 1-bit-comparator based parallel conversion. Moreover, a distributed in-memory computing architecture is also developed with according control protocol. Both memory array and logic accelerator are implemented on the binary memristor-crossbar, where logic-memory pair can be distributed with protocol of control bus. Experiment results have shown that compared to the analog memristor-crossbar, the proposed binary memristor-crossbar can achieve significant area-saving with better calculation accuracy. Moreover, significant speedup can be achieved for matrix-vector multiplication in the neuron-network based machine learning such that the overall training and testing time can be both reduced respectively. In addition, large energy saving can be also achieved when compared to the traditional CMOS-based out-of-memory computing architecture.
design, automation, and test in europe | 2015
Yuhao Wang; Hantao Huang; Leibin Ni; Hao Yu; Mei Yan; Chuliang Weng; Wei Yang; Junfeng Zhao
Data analytics such as face recognition involves large volume of image data, and hence leads to grand challenge on mobile platform design with strict power requirement. Emerging non-volatile STT-MRAM has the minimum leakage power and comparable speed to SRAM, and hence is considered as a promising candidate for data-oriented mobile computing. However, there exists significantly higher write-energy for STT-MRAM when compared to the SRAM. Based on the use of STT-MRAM, this paper introduces an energy-efficient non-volatile in-memory accelerator for a sparse-representation based face recognition algorithm. We find that by projecting high-dimension image data to much lower dimension, the current scaling for STT-MRAM write operation can be applied aggressively, which leads to significant power reduction yet maintains quality-of-service for face recognition. Specifically, compared to a baseline with SRAM, leakage power and dynamic power are reduced by 91.4% and 79% respectively with only slight compromise on recognition rate.
ACM Transactions on Design Automation of Electronic Systems | 2018
Hantao Huang; Hang Xu; Yuehua Cai; Rai Suleman Khalid; Hao Yu
Real-time data analytics for smart-grid energy management is challenging with consideration of both occupant behavior profiles and energy profiles. This article proposes a distributed and networked machine-learning platform on smart-gateway-based smart-grid in residential buildings. It can analyze occupant behaviors, provide short-term load forecasting, and allocate renewable energy resources. First, occupant behavior profile is captured by real-time indoor positioning system with WiFi data analytics; and the energy profile is extracted by real-time meter system with electricity load data analytics. Then, the 24-hour occupant behavior profile and energy profile are fused with prediction using an online distributed machine-learning algorithm with real-time data update. Based on the forecasted occupant behavior profile and energy profile, solar energy source is allocated to reduce peak demand on the main electricity power-grid. The whole management flow can be operated on the distributed smart-gateway network with limited computational resources but with a supported general machine-learning engine. Experimental results on occupant behavior extraction show that the proposed algorithm can achieve 91.2% positioning accuracy within 3.64m. Moreover, 50× and 38× speed-up is obtained during data testing and training, respectively, when compared to traditional support vector machine (SVM) method. For short-term load forecasting, it is 14.83% more accurate when compared to SVM-based data analytics. Based on the predicted occupant behavior profile and energy profile, our proposed energy management system can achieve 19.66% more peak load reduction and 26.41% more cost saving as compared to the SVM-based method.
international symposium on nanoscale architectures | 2016
Leibin Ni; Hantao Huang; Hao Yu
This paper introduces a memristor-network based accelerator for L2-norm based machine learning. A coupled-memristor-oscillator network is developed for a L2-norm calculation; and a binary-memristor-crossbar network is developed to accelerate matrix-vector multiplication. As such, one can map gradient-descent (of L2-norm) based on-line machine learning on the proposed memristor-network that is composed of coupled-oscillator (to sample L2-norm) and binary-crossbar (to digitize L2-norm). Experiment results have shown that such a memristor-network based accelerator can achieve significant power reduction and runtime speed-up for both training and testing compared to the conventional CMOS-CPU based implementation.
ieee international d systems integration conference | 2016
Hantao Huang; Leibin Ni; Yuhao Wang; Hao Yu; Zongwei Wangl; Yimao Cail; Ru Huangl
Incremental machine learning is required for future real-time data analytics. This paper introduces a 3D multilayer CMOS-RRAM accelerator for an incremental least-squares based learning on neural network. Given input of buffered data hold on the layer of a RRAM memory, intensive matrix-vector multiplication can be firstly accelerated on the layer of a digitized RRAM-crossbar. The remaining incremental leastsquares algorithmic operations for feature extraction and classifier training can be accelerated on the layer of CMOS ASIC, using an incremental Cholesky factorization accelerator realized with consideration of parallelism and pipeline. Experiment results have shown that such a 3D accelerator can significantly reduce training time with acceptable accuracy. Compared to 3D-CMOS-ASIC implementation, it can achieve 1.28x smaller area, 2.05x faster runtime and 12.4x energy reduction. Compared to GPU implementation, our work shows 3.07x speed-up and 162.86x energy-saving.