Wen-Huang Cheng
Center for Information Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wen-Huang Cheng.
user interface software and technology | 2013
Liwei Chan; Rong-Hao Liang; Ming-Chang Tsai; Kai-Yin Cheng; Chao-Huai Su; Mike Y. Chen; Wen-Huang Cheng; Bing-Yu Chen
We present FingerPad, a nail-mounted device that turns the tip of the index finger into a touchpad, allowing private and subtle interaction while on the move. FingerPad enables touch input using magnetic tracking, by adding a Hall sensor grid on the index fingernail, and a magnet on the thumbnail. Since it permits input through the pinch gesture, FingerPad is suitable for private use because the movements of the fingers in a pinch are subtle and are naturally hidden by the hand. Functionally, FingerPad resembles a touchpad, and also allows for eyes-free use. Additionally, since the necessary devices are attached to the nails, FingerPad preserves natural haptic feedback without affecting the native function of the fingertips. Through user study, we analyze the three design factors, namely posture, commitment method and target size, to assess the design of the FingerPad. Though the results show some trade-off among the factors, generally participants achieve 93% accuracy for very small targets (1.2mm-width) in the seated condition, and 92% accuracy for 2.5mm-width targets in the walking condition.
IEEE Transactions on Circuits and Systems for Video Technology | 2007
Wen-Huang Cheng; Chia-Wei Wang; Ja-Ling Wu
The browsing of quality videos on small hand-held devices is a common scenario in pervasive media environments. In this paper, we propose a novel framework for video adaptation based on content recomposition. Our objective is to provide effective small size videos which emphasize the important aspects of a scene while faithfully retaining the background context. That is achieved by explicitly separating the manipulation of different video objects. A generic video attention model is developed to extract user-interest objects, in which a high-level combination strategy is proposed for fusing the adopted three types of visual attention features: intensity, color, and motion. Based on the knowledge of media aesthetics, a set of aesthetic criteria is presented. Accordingly, these objects are well reintegrated with the direct-resized background to optimally match the specific screen sizes. Experimental results demonstrate the efficiency and effectiveness of our approach
OncoTargets and Therapy | 2015
Kai-Lung Hua; Che-Hao Hsu; Shintami Chusnul Hidayati; Wen-Huang Cheng; Yu-Jen Chen
Lung cancer has a poor prognosis when not diagnosed early and unresectable lesions are present. The management of small lung nodules noted on computed tomography scan is controversial due to uncertain tumor characteristics. A conventional computer-aided diagnosis (CAD) scheme requires several image processing and pattern recognition steps to accomplish a quantitative tumor differentiation result. In such an ad hoc image analysis pipeline, every step depends heavily on the performance of the previous step. Accordingly, tuning of classification performance in a conventional CAD scheme is very complicated and arduous. Deep learning techniques, on the other hand, have the intrinsic advantage of an automatic exploitation feature and tuning of performance in a seamless fashion. In this study, we attempted to simplify the image analysis pipeline of conventional CAD with deep learning techniques. Specifically, we introduced models of a deep belief network and a convolutional neural network in the context of nodule classification in computed tomography images. Two baseline methods with feature computing steps were implemented for comparison. The experimental results suggest that deep learning methods could achieve better discriminative results and hold promise in the CAD application domain.
multimedia information retrieval | 2003
Wen-Huang Cheng; Wei-Ta Chu; Ja-Ling Wu
Semantic context detection is one of the key techniques to facilitate efficient multimedia retrieval. Semantic context is a scene that completely represents a meaningful information segment to human beings. In this paper, we propose a novel hierarchical approach that models the statistical characteristics of several audio events, over a time series, to accomplish semantic context detection. The approach consists of two stages: audio event and semantic context detections. HMMs are used to model basic audio events, and event detection is performed in the first stage. Then semantic context detection is achieved based on Gaussian mixture models, which model the correlations among several audio events temporally. With this framework, we bridge the gaps between low-level features and the semantic contexts that last in a time series. The experimental evaluations indicate that the approach is effective in detecting high-level semantics.
IEEE Transactions on Multimedia | 2012
Yin-Hsi Kuo; Wen-Huang Cheng; Hsuan-Tien Lin; Winston H. Hsu
We have witnessed the exponential growth of images and videos with the prevalence of capture devices and the ease of social services such as Flickr and Facebook. Meanwhile, enormous media collections are along with rich contextual cues such as tags, geo-locations, descriptions, and time. To obtain desired images, users usually issue a query to a search engine using either an image or keywords. Therefore, the existing solutions for image retrieval rely on either the image contents (e.g., low-level features) or the surrounding texts (e.g., descriptions, tags) only. Those solutions usually suffer from low recall rates because small changes in lighting conditions, viewpoints, occlusions, or (missing) noisy tags can degrade the performance significantly. In this work, we tackle the problem by leveraging both the image contents and associated textual information in the social media to approximate the semantic representations for the two modalities. We propose a general framework to augment each image with relevant semantic (visual and textual) features by using graphs among images. The framework automatically discovers relevant semantic features by propagation and selection in textual and visual image graphs in an unsupervised manner. We investigate the effectiveness of the framework when using different optimization methods for maximizing efficiency. The proposed framework can be directly applied to various applications, such as keyword-based image search, image object retrieval, and tag refinement. Experimental results confirm that the proposed framework effectively improves the performance of these emerging image retrieval applications.
IEICE Transactions on Information and Systems | 2005
Wen-Huang Cheng; Wei-Ta Chu; Ja-Ling Wu
This paper presents a framework for automatic video region-of-interest determination based on visual attention model. We view this work as a preliminary step towards the solution of high-level semantic video analysis. Facing such a challenging issue, in this work, a set of attempts on using video attention features and knowledge of computational media aesthetics are made. The three types of visual attention features we used are intensity, color, and motion. Referring to aesthetic principles, these features are combined according to camera motion types on the basis of a new proposed video analysis unit, frame-segment. We conduct subjective experiments on several kinds of video data and demonstrate the effectiveness of the proposed framework.
acm multimedia | 2012
Yan Ching Lin; Min Chun Hu; Wen-Huang Cheng; Yung Huan Hsieh; Hong Ming Chen
Observing the widespread use of Kinect-like depth cameras, in this work, we investigate into the problem of using sole depth data for human action recognition and retrieval in videos. We proposed the use of simple depth descriptors without learning optimization to achieve promising performances as compatible to those of the leading methods based on color images and videos, and can be effectively applied for real-time applications. Because of the infrared nature of depth cameras, the proposed approach will be especially useful under poor lighting conditions, e.g. the surveillance environments without sufficient lighting. Meanwhile, we proposed a large Depth-included Human Action video dataset, namely DHA, which contains 357 videos of performed human actions belonging to 17 categories. To the best of our knowledge, the DHA is one of the largest depth-included video datasets of human actions.
international conference on computer graphics and interactive techniques | 2012
Sheng-Jie Luo; I-Chao Shen; Bing-Yu Chen; Wen-Huang Cheng; Yung-Yu Chuang
This paper presents a novel technique for seamless stereoscopic image cloning, which performs both shape adjustment and color blending such that the stereoscopic composite is seamless in both the perceived depth and color appearance. The core of the proposed method is an iterative disparity adaptation process which alternates between two steps: disparity estimation, which re-estimates the disparities in the gradient domain so that the disparities are continuous across the boundary of the cloned region; and perspective-aware warping, which locally re-adjusts the shape and size of the cloned region according to the estimated disparities. This process guarantees not only depth continuity across the boundary but also models local perspective projection in accordance with the disparities, leading to more natural stereoscopic composites. The proposed method allows for easy cloning of objects with intricate silhouettes and vague boundaries because it does not require precise segmentation of the objects. Several challenging cases are demonstrated to show that our method generates more compelling results compared to methods with only global shape adjustment.
international symposium on circuits and systems | 2005
Wen-Huang Cheng; Wei-Ta Chu; J.B. Kuo; Ja-Ling Wu
The paper presents a framework for automatic video region-of-interest determination based on a user attention model. A set of attempts on using video attention features and knowledge of applied media aesthetics are made. The three types of visual attention features used are intensity, color, and motion. Referring to aesthetic principles, these features are combined according to camera motion types on the basis of a newly proposed video analysis unit, frame-segment. We conduct subjective experiments on several kinds of video data and demonstrate the effectiveness of the proposed framework.
pacific rim conference on multimedia | 2003
Chia-Chiang Ho; Wen-Huang Cheng; Ting-Jian Pan; Ja-Ling Wu
In this paper, a generic user-attention based focus detection framework is developed to capture user focus points for video frames. The proposed framework considers both bottom-up and top-down attentions, and integrates both image-based and video-based visual features for saliency map computation. For efficiency purpose, the number of adopted features is kept as few as possible. The realized framework is extensible and flexible in integrating more features with a variety of fusion schemes. One application of the proposed framework, the user-assisted spatial resolution reduction, has also been addressed.