Is this you? Create Your Porfile

Wenhan Zhu

Shanghai Jiao Tong University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wenhan Zhu is active.

Explore More

Publication

Featured researches published by Wenhan Zhu.

PLOS ONE | 2018

Combination of near-infrared and thermal imaging techniques for the remote and simultaneous measurements of breathing and heart rates under sleep situation

Menghan Hu; Guangtao Zhai; Duo Li; Yezhao Fan; Huiyu Duan; Wenhan Zhu; Xiaokang Yang; You Yang

To achieve the simultaneous and unobtrusive breathing rate (BR) and heart rate (HR) measurements during nighttime, we leverage a far-infrared imager and an infrared camera equipped with IR-Cut lens and an infrared lighting array to develop a dual-camera imaging system. A custom-built cascade face classifier, containing the conventional Adaboost model and fully convolutional network trained by 32K images, was used to detect the face region in registered infrared images. The region of interest (ROI) inclusive of mouth and nose regions was afterwards confirmed by the discriminative regression and coordinate conversions of three selected landmarks. Subsequently, a tracking algorithm based on spatio-temporal context learning was applied for following the ROI in thermal video, and the raw signal was synchronously extracted. Finally, a custom-made time-domain signal analysis approach was developed for the determinations of BR and HR. A dual-mode sleep video database, including the videos obtained under environment where illumination intensity ranged from 0 to 3 Lux, was constructed to evaluate the effectiveness of the proposed system and algorithms. In linear regression analysis, the determination coefficient (R2) of 0.831 had been observed for the measured BR and reference BR, and this value was 0.933 for HR measurement. In addition, the Bland-Altman plots of BR and HR demonstrated that almost all the data points located within their own 95% limits of agreement. Consequently, the overall performance of the proposed technique is acceptable for BR and HR estimations during nighttime.

international conference on systems signals and image processing | 2017

IVQAD 2017: An immersive video quality assessment database

Huiyu Duan; Guangtao Zhai; Xiaokang Yang; Duo Li; Wenhan Zhu

This paper presents a new database, Immersive Video Quality Assessment Database 2017 (IVQAD 2017), intended for immersive video quality assessment in virtual reality environment. Video quality assessment (VQA) plays an important role in video research fields. Nowadays virtual reality technology have been widely used and playing videos in virtual reality visual system is becoming more and more popular. However, existing research in VQA fields mainly focus on traditional videos. In this paper, we build the IVQAD which contains 10 raw videos and 150 distorted videos. Bit rate, frame rate and resolution were considered as quality degradation factors. All the videos were encoded with MPEG-4. Subjects were asked to assess the video under virtual reality environment and mean opinion score (MOS) was derived by computing. Using IVQAD 2017, researchers can explore the influence of resolution, video compression and video packet loss on immersive videos quality.

Signal Processing | 2018

Arrow’s Impossibility Theorem inspired subjective image quality assessment approach

Wenhan Zhu; Guangtao Zhai; Menghan Hu; Jing Liu; Xiaokang Yang

Abstract A large number of subjective image quality assessment databases have been constructed in the last decade, in which the Mean Opinion Score (MOS) (with single or double stimulus), and Paired Comparison (PC) are two dominant approaches for collecting the ground truth quality ratings and usually up to 15 or more subjects are needed for each image. In this paper, we show the fact that there is a potential “dictatorship” risk of using such averaging-over-multiple-rating type of method. Using Arrow’s Impossibility Theorem (AIT), we prove that meeting of the unanimity and independence of irrelevant alternatives (IIA) will generate a “pivotal subject”, who in fact determines the final rank of image quality. We also prove that no an ideal democratic approach to synthesize the whole opinions of subjects. Therefore, we advocate to recruit a small number of experts (a.k.a the “golden eyes”) for subjective viewing tests. In order to verify the reliability of our proposal, experiments on two different databases conducting on the general distorted images and professional images (here is Terahertz security image) are performed. In each experiment, the raw scores of images are subjectively assigned by at least 15 inexperienced viewers and 3 experts, and meanwhile the MOS or difference mean opinion score (DMOS) are obtained. Afterwards, the correlation of the scores rated by naive subjects and experts is analyzed. For general image experiment, it is revealed that DMOS of inexperience viewers are highly related to DMOS of experts based on six effective evaluation metrics. In professional image experiment, the preferences of experts also maintain favourable relevance with the opinions of inexperienced viewers in overall quality of THz image. Moreover, considering the quality assessments of illegal substance regions in THz images, the experts have higher accuracy than the inexperienced observers. In conclusion, the results of two validation experiments verify that a small number of experts are more suitable for assessing the perceptual quality of images, which can reduce cost and simplify procedure of creating databases.

quality of multimedia experience | 2017

No-reference quality assessment for JPEG compressed images

Yucheng Zhu; Guangtao Zhai; Ke Gu; Wenhan Zhu

JPEG is a most commonly used standard of compression for digital images. Quality factor (Qfactor) for JPEG compressed image is actually a suitable indicator to the perceptual quality. However, the information of the compressor might be unknown due to various reasons. To evaluate the Qfactor, we recompress the formerly compressed image and measure the consistency between them. Then we define the fixed points (the points on the Qfactor-axis where the content of recompressed images are almost the same with that of directly compressed images) by following the Qfactor based specifications and form the image set. The quality of JPEG compressed images are measured by combining the estimated Qfactor with the features extracted from the image set. The experimental results confirm that the proposed image quality assessment technique, which is no-reference, is able to faithfully predict the visual quality of JPEG compressed images.

pacific rim conference on multimedia | 2017

On the Impact of Environmental Sound on Perceived Visual Quality

Wenhan Zhu; Guangtao Zhai; Wei Sun; Yi Xu; Jing Liu; Yucheng Zhu; Xiaokang Yang

Most of existing visual quality assessment databases are created in controlled conditions where the experimental environments are always kept silent. However, the practical viewing environments often contain diverse environmental sounds. It is our daily experience that different sounds (e.g. chatter, honk and music) can affect our emotions, hence influencing our perceptions of images. So, there is a gap between visual quality under environmental sounds and existing researches of visual quality. Therefore, in this paper, we perform subjective quality evaluations with different types and volumes of environmental sounds. We build a rigorous experimental system to control various conditions of environmental sounds and construct the environmental sound–image database. Afterwards, the influence of environmental sounds on perceived visual quality are analysed from four perspectives: sound categories, sound volumes, distortion levels of images, and image contents.

international conference on multimedia and expo | 2017

Dual-mode imaging system for non-contact heart rate estimation during night

Menghan Hu; Guangtao Zhai; Duo Li; Yezhao Fan; Huiyu Duan; Wenhan Zhu; Xiaokang Yang

To estimate heart rate (HR) during night remotely and unobtrusively, we use a far-infrared camera and an RGB-Infrared camera to develop a dual-mode imaging system. The RGB or infrared images are first registered by the far-infrared images through the affine transformation. A custom-made cascade face classifier, which contains the conventional Adaboost model and fully convolutional network, was applied for the detection of the face in registered infrared images. The fully convolutional network was trained by 32K images from the PASCAL dataset. Subsequently, two facial tissues viz., mouth and nose were determined by the discriminative regression via the coordinate conversions of three selected landmarks. The spatio-temporal context learning was utilized to track the mouth and nose regions in the far-infrared image sequence. Then, the raw image feature was extracted from these two regions of interest. Finally, a state-of-art signal analysis method was explored to calculate HR in the time domain of the extracted raw signal. With respect to the validation experiment, we established the dual-mode sleep video database to verify the performances of the proposed system and algorithms. All videos in database were filmed under the environments where actual illumination intensity ranged from 0 to 3 Lux. The obtained results demonstrated that the determination coefficient (R2) was 0.933 for HR estimation in linear regression analysis. The Bland-Altman analysis showed that almost all the data points located within the 95% upper and lower limits of agreement, which were 4.293 and −5.293 bpm, respectively. Therefore, the proposed technique is efficient for the non-contact and unobtrusive HR estimation during night.

arXiv: Computer Vision and Pattern Recognition | 2017

Terahertz Security Image Quality Assessment by No-reference Model Observers

Menghan Hu; Xiongkuo Min; Wenhan Zhu; Yucheng Zhu; Zhaodi Wang; Xiaokang Yang; Guang Tian

To provide the possibility of developing objective image quality assessment (IQA) algorithms for THz security images, we constructed the THz security image database (THSID) including a total of 181 THz security images with the resolution of 127*380. The main distortion types in THz security images were first analyzed for the design of subjective evaluation criteria to acquire the mean opinion scores. Subsequently, the existing no-reference IQA algorithms, which were 5 opinion-aware approaches viz., NFERM, GMLF, DIIVINE, BRISQUE and BLIINDS2, and 8 opinion-unaware approaches viz., QAC, SISBLIM, NIQE, FISBLIM, CPBD, S3 and Fish_bb, were executed for the evaluation of the THz security image quality. The statistical results demonstrated the superiority of Fish_bb over the other testing IQA approaches for assessing the THz image quality with PLCC (SROCC) values of 0.8925 (-0.8706), and with RMSE value of 0.3993. The linear regression analysis and Bland-Altman plot further verified that the Fish__bb could substitute for the subjective IQA. Nonetheless, for the classification of THz security images, we tended to use S3 as a criterion for ranking THz security image grades because of the relatively low false positive rate in classifying bad THz image quality into acceptable category (24.69%). Interestingly, due to the specific property of THz image, the average pixel intensity gave the best performance than the above complicated IQA algorithms, with the PLCC, SROCC and RMSE of 0.9001, -0.8800 and 0.3857, respectively. This study will help the users such as researchers or security staffs to obtain the THz security images of good quality. Currently, our research group is attempting to make this research more comprehensive.

International Forum on Digital TV and Wireless Multimedia Communications | 2017

Selection of Good Display Mode for Terahertz Security Image via Image Quality Assessment

Zhaodi Wang; Menghan Hu; Wenhan Zhu; Xiaokang Yang; Guang Tian

In order to provide a good display performance for THz (terahertz) security image, we designed several display modes on the custom-built THz security image database (THSID). Based on our statistical analysis of THz images, a total of 4 candidate display modes are proposed, namely averaging the highest 1%, 10%, 20%, 30% pixel values in Z-axis for a coordinate (x, y). In this paper, the subjective evaluation was first carried out, demonstrating that the second display mode, that was the averaging the highest 10% pixel values in Z-axis, got the greatest performance. Subsequently, to further support the result obtained by the subjective evaluation and the high throughout application requirement in real world, a total of 11 objective no-reference IQA (Image Quality Assessment) algorithms were implemented, including 4 opinion-aware approaches, viz. GMLF, NFERM, BLIINDS2, BRISQUE, and 7 opinion-unaware approaches viz. CPBD, FISBLIM, NIQE, QAC, SISBLIM, S3, Fish_bb. The results of objective evaluation show that the current objective IQA algorithms can hardly support the subjective evaluation. Even so, BLIINDS2 and CPBD perform relatively well for the chosen display mode above. A more suitable objective evaluation method need to be explored in the future study. This study will make some progresses on the display effect of THz image, which can promote the detection accuracy in the future applications.

international conference on cloud computing | 2016

Distinguish True or False 4K Resolution Using Frequency Domain Analysis and Free-Energy Modelling

Wenhan Zhu; Guangtao Zhai; Jing Liu; Jia Wang; Xiaokang Yang

With the prevalence of Ultra-High Definition (UHD) display terminals, 4K resolution (38402160 pixels) contents are becoming a major selling point for online video media. However, due to the insufficiency of natural UHD contents, a large number of false 4K videos are circulating on the web. Those 4K contents, usually being upscaled from lower resolutions, often frustrate enthusiastic consumers and are in fact a waste of stringent bandwidth resources. In this paper, we propose to distinguish natural 4K contents from false ones through frequency domain analysis. The basic assumption is that true 4K contents have much more high frequency responses than the upscaled versions. We use the free energy modelling to approximate the Human Visual System so as to minimize the impact of structural complexity of visual contents. We set up a dataset containing more than 1k original 4K frames together with upscaled versions using four widely used interpolation algorithms. Experimental results show that the proposed free-energy-based metric has an accuracy rate higher than 90%.

international symposium on circuits and systems | 2018