Shih-Hao Hung | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shih-Hao Hung is active.

Explore More

Publication

Featured researches published by Shih-Hao Hung.

Computers & Mathematics With Applications | 2012

Executing mobile applications on the cloud: Framework and issues

Shih-Hao Hung; Chi-Sheng Shih; Jeng-Peng Shieh; Chen-Pang Lee; Yi-Hsiang Huang

Modern mobile devices, such as smartphones and tablets, have made many pervasive computing dreams come true. Still, many mobile applications do not perform well due to the shortage of resources for computation, data storage, network bandwidth, and battery capacity. While such applications can be re-designed with client-server models to benefit from cloud services, the users are no longer in full control of the application, which has become a serious concern for data security and privacy. In addition, the collaboration between a mobile device and a cloud server poses complex performance issues associated with the exchange of application state, synchronization of data, network condition, etc. In this work, a novel mobile cloud execution framework is proposed to execute mobile applications in a cloud-based virtualized execution environment controlled by mobile applications and users, with encryption and isolation to protect against eavesdropping from cloud providers. Under this framework, several efficient schemes have been developed to deal with technical issues for migrating applications and synchronizing data between execution environments. The communication issues are also addressed in the virtualization execution environment with probabilistic communication Quality-of-Service (QoS) technique to support timely application migration.

international conference on parallel processing | 2011

CSR: A Cloud-Assisted Speech Recognition Service for Personal Mobile Device

Yu-Shuo Chang; Shih-Hao Hung; Nick J. C. Wang; Bor-Shen Lin

Automatic speech recognition (ASR) is a technology which converts the phrases or words spoken by human into text. As a mature technology, ASR has become an alternative input method on many mobile devices, complementing the other input methods operated by hands. Although the technology has been developed for years, the accuracy and computational complexity of ASR have prohibited ASR from being used as the primary input method on mobile devices. While speaker-dependent ASR (SD-ASR) technologies may be used to improve the recognition accuracy, the user is often reluctant to take a time-consuming training process needed to enable SD-ASR for each device. To overcome these problems, we propose a cloud-assisted speech recognition service and its infrastructural design, called CSR, which utilizes servers in the cloud to accelerate ASR, integrates SD-ASR technologies to improve the accuracy of ASR, and populate SD information to enable SD-ASR on multiple mobile device. We have built a prototype to observe the benefits of the CSR service and the issues in load balance, power consumption, and privacy on the client and the server. We show that the CSR service offers fast responses, good accuracy, high availability and good scalability in serving many concurrent users.

acm symposium on applied computing | 2010

Energy-efficient real-time scheduling of multimedia tasks on multi-core processors

Yi-Hung Wei; Chuan-Yue Yang; Tei-Wei Kuo; Shih-Hao Hung; Yuan-Hua Chu

In recent years, various multi-core architectures have become popular selections for the designs of mobile platforms. With the strong computing demands from many multimedia applications, how to energy-efficiently utilize the computing power of mobile platforms without violations of timing constraints has become a critical design problem. In this paper, a data-partitioning-based approach is proposed to explore the parallelism of multimedia workload processing over multiple cores. Dynamic voltage scaling and dynamic power management strategies are both considered in the dynamic scaling of the computing power of cores and the adjustment of the set of active cores, respectively. The practicability and the energy efficiency of the proposed algorithms were evaluated by a series of experiments and simulations, for which we have encouraging results.

international conference on parallel processing | 1994

A Hierarchical Approach to Modeling and Improving the Performance of Scientific Applications on the KSR1

Eric L. Boyd; Waqar Azeem; Hsien–Hsin Lee; Tien–Pao Shih; Shih-Hao Hung; Edward S. Davidson

We have developed a hierarchical performance bounding methodology that attempts to explain the performance of loop-dominated scientific applications on particular systems. The Kendall Square Research KSR1 is used as a running example. We model the throughput of key hardware units that arc common bottlenecks in concurrent machines. The four units currently used are: memory port, floating-point, instruction issue, and a loop-carried dependence pseudo-unit. We propose a workload characterization, and derive upper bounds on the performance of specific machine-workload pairs. Comparing delivered performance with bounds focuses attention on areas for improvement and indicates how much improvement might be attainable. We delineate a comprehensive approach to modeling and improving application performance on the KSR1. Application of this approach is being automated for the KSR1 with a series of tools including K-MA and K-MACSTAT (which enable the calculation of the MACS hierarchy of performance bounds), K-Trace (which allows parallel code to be instrumented to produce a memory reference trace), and K-Cache (which simulates inter-cache communications based on a memory reference trace).

innovative mobile and internet services in ubiquitous computing | 2011

An Online Migration Environment for Executing Mobile Applications on the Cloud

Shih-Hao Hung; Chi-Sheng Shih; Jeng-Peng Shieh; Chen-Pang Lee; Yi-Hsiang Huang

Modern smart phones have made many pervasive computing dreams come true. Still, many mobile applications do not perform well due to the shortage of resources for computation, data storage, network bandwidth, and battery capacity. While such applications can be re-designed with client-server models to benefit from cloud services, the users are no longer in full control of the application, which has become a serious concern. We proposed a cloud/mobile execution framework to execute mobile applications in a cloud-based virtualized execution environment controlled by mobile applications and users, with encryption and isolation to protect against eavesdropping from cloud providers. We have developed efficient schemes for migrating applications and synchronizing data between execution environments. We also addressed the communication issues in the virtualization execution environment with probabilistic communication Quality-of-Service (QoS) technique to support timely application migration.

embedded and real-time computing systems and applications | 2009

Zero-Buffer Inter-core Process Communication Protocol for Heterogeneous Multi-core Platforms

Yu Hsien Lin; Chia Heng Tu; Chi-Sheng Shih; Shih-Hao Hung

Executing functional components in pipeline on heterogeneous multi-core platforms can greatly improve the parallelism but require great amount of data communication among processes and threads. Our studies showed that existing inter-process/thread communication protocols consist of many unnecessary memory copies and prolong the execution of the applications on heterogeneous multi-core platforms. NTU ICPC uses polling-base mail notification to unnecessary context switches, and designs a memory subsystem to manage the input and output data between the senders and receivers. The protocol was implemented and evaluated on heterogeneous multi-core platform for several use scenario including H.264 encoding process. The evaluation results show that the communication overhead on sender side is independent of the data size and that on receiver side is greatly shortened, compared to several inter-process Communication (IPC) protocols including mailbox, message queue, and shared memory. When encoding H.264 video clips, the encoding frame rates increase for more than 30%. Our experiments also showed that the communication overhead accounts 40% to 50% of total execution time in average for H.264 video decoding applications. In this paper, we present the design and implementation of zero-buffer inter-core process communication protocol, named NTU ICPC, to shorten communication overhead for pipeline executed applications on heterogeneous multi-core platforms.

ACM Transactions on Design Automation of Electronic Systems | 2014

Performance and power profiling for emulated Android systems

Chia Heng Tu; Hui Hsin Hsu; Jen Hao Chen; Chun Han Chen; Shih-Hao Hung

Simulation is a common approach for assisting system design and optimization. For system-wide optimization, energy and computational resources are often the two most critical issues. Monitoring the energy state of each hardware component and measuring the time spent in each state is needed for accurate energy and performance prediction. For software optimization, it is important to profile the energy and the time consumed by each software construct in a realistic operating environment with a proper workload. However, the conventional approaches of simulation often fail to produce satisfying data. First, building a cycle-accurate simulation environment for a complex system, such as an Android smartphone, is difficult and can take a long time. Second, a slow simulation can significantly alter the behavior of multithreaded, I/O-intensive applications and can affect the accuracy of profiles. Third, existing software-based profilers generally do not work on simulators, which makes it difficult for performance analysis of complicated software, for example, Java applications executed by the Dalvik VM in an Android system. To address these aforementioned problems, we proposed and prototyped a framework, called virtual performance analyzer (VPA). VPA takes advantage of an existing emulator or virtual machine monitor to reduce the complexity of building a simulator. VPA allows the user to selectively and incrementally integrate timing models and power models into the emulator with our carefully designed performance/power monitors, tracing facility, and profiling tools to evaluate and analyze the emulated system. The emulated system can perform at different levels of speed to help verify if the profile data are impacted by the emulation speed. Finally, VPA supports existing software-based profiles and enables non-intrusive tracing/profiling by minimizing the probe effect. Our experimental results show that the VPA framework allows users to quickly establish a performance/power evaluation environment and gather useful information to support system design and software optimization for Android smartphones.

ACM Transactions in Embedded Computing Systems | 2012

An adaptive file-system-oriented FTL mechanism for flash-memory storage systems

Yuan-Hao Chang; Po-Liang Wu; Tei-Wei Kuo; Shih-Hao Hung

As flash memory becomes popular over various platforms, there is a strong demand regarding the performance degradation problem, due to the special characteristics of flash memory. This research proposes the design of a file-system-oriented flash translation layer, in which a filter mechanism is designed to separate the access requests of file-system metadata and file contents for better performance. A recovery scheme is then proposed for maintaining the integrity of a file system. The proposed flash translation layer is implemented as a Linux device driver and evaluated with respect to ext2 and ext3 file systems. Experiments were also done over NTFS by a series of realistic traces. The experimental results show significant performance improvement over ext2, ext3, and NTFS file systems with limited system overheads.

asia and south pacific design automation conference | 2012

System-wide profiling and optimization with virtual machines

Shih-Hao Hung; Tei-Wei Kuo; Chi-Sheng Shih; Chia Heng Tu

Simulation is a common approach for assisting system design and optimization. For system-wide optimization, energy and computational resources are often the two most critical limitations. Modeling energy-states of each hardware component and time spent in each state is needed for accurate energy and performance prediction. Tracking software execution in a realistic operating environment with properly modeled input/output is key to accurate prediction. However, the conventional approaches can have difficulties in practice. First, for a complex system such as an Android smartphone, building a cycle-accurate simulation environment is no easy task. Secondly, for I/O-intensive applications, a slow simulation would significantly alter the application behavior and change its performance profile. Thirdly, conventional software profiling tools generally do not work on simulators, which makes it difficult for performance analysis of complicated software, e.g., Java applications executed by the Dalvik virtual machine. Recently, virtual machine technologies are widely used to emulate a variety of computer systems. While virtual machines do not model the hardware components in the emulated system, we can ease the effort of building a simulation environment by leveraging the infrastructure of virtual machines and adding performance and power models. Moreover, multiple sets of the performance and energy models can be selectively used to verify if the speed of the simulated system impacts the software behavior. Finally, performance monitoring facilities can be integrated to work with profiling tools. We believe this approach should help overcome the aforementioned difficulties. We have prototyped a framework and our case studies showed that the information provided by our tools are useful for software optimization and system design for Android smartphones.

embedded and real-time computing systems and applications | 2010

Designing and Implementing a Portable, Efficient Inter-core Communication Scheme for Embedded Multicore Platforms

Shih-Hao Hung; Wen Long Yang; Chia Heng Tu

In the recent years, multicore processor designs have become increasingly popular for embedded applications, but diversified inter-core communication mechanisms have led to the difficulties in software development, integration and migration. A unified, portable, and efficient inter-core communication mechanism would have helped reduce these difficulties significantly, but such a solution did not exist today. We proposed a scheme called MSG, which provides users with a set of essential message-passing programming interfaces adopted from MPI and MCAPI, including blocking and non-blocking point-to-point communications, one-sided communications, and collective operations. We experimented and evaluated our design methodology with the case study on the IBM CELL, a popular heterogeneous multicore platform. On the CELL platform, our MSG library fitted in the 256KB local memory on each individual processor core and outperformed two existing communication libraries, DaCS and CML. With a systematic approach, we showed how optimization could be done on the CELL platform to improve the performance of the MSG library. Hopefully, our experiences help the design and development of communication libraries for existing and future multicore platforms and embedded applications.

Explore More