Yew-Huey Liu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yew-Huey Liu is active.

Explore More

Publication

Featured researches published by Yew-Huey Liu.

international conference on computer communications | 2011

Application-aware virtual machine migration in data centers

Vivek Shrivastava; Petros Zerfos; Kang-Won Lee; Hani Jamjoom; Yew-Huey Liu; Suman Banerjee

While virtual machine (VM) migration is allowing data centers to rebalance workloads across physical machines, the promise of a maximally utilized infrastructure is yet to be realized. Part of the challenge is due to the inherent dependencies between VMs comprising a multi-tier application, which introduce complex load interactions between the underlying physical servers. For example, simply moving an overloaded VM to a (random) underloaded physical machine can inadvertently overload the network. We introduce AppAware—a novel, computationally efficient scheme for incorporating (1) inter-VM dependencies and (2) the underlying network topology into VM migration decisions. Using simulations, we show that our proposed method decreases network traffic by up to 81%compared to a well known alternative VM migration method that is not application-aware.

virtual execution environments | 2011

Overdriver: handling memory overload in an oversubscribed cloud

Dan Williams; Hani Jamjoom; Yew-Huey Liu; Hakim Weatherspoon

With the intense competition between cloud providers, oversubscription is increasingly important to maintain profitability. Oversubscribing physical resources is not without consequences: it increases the likelihood of overload. Memory overload is particularly damaging. Contrary to traditional views, we analyze current data center logs and realistic Web workloads to show that overload is largely transient: up to 88.1% of overloads last for less than 2 minutes. Regarding overload as a continuum that includes both transient and sustained overloads of various durations points us to consider mitigation approaches also as a continuum, complete with tradeoffs with respect to application performance and data center overhead. In particular, heavyweight techniques, like VM migration, are better suited to sustained overloads, whereas lightweight approaches, like network memory, are better suited to transient overloads. We present Overdriver, a system that adaptively takes advantage of these tradeoffs, mitigating all overloads within 8% of well-provisioned performance. Furthermore, under reasonable oversubscription ratios, where transient overload constitutes the vast majority of overloads, Overdriver requires 15% of the excess space and generates a factor of four less network traffic than a migration-only approach.

IEEE Transactions on Computers | 1993

A quantitative evaluation of cache types for high-performance computer systems

Ching-Farn Eric Wu; Yarsun Hsu; Yew-Huey Liu

Parallel accesses to the table lookaside buffer (TLB) and cache array are crucial for high-performance computer systems, and the choice of cache types is one of the most important factors affecting cache performance. The authors classify caches according to both index and tag. Since both index and tag could be either virtual (V) or real (R), their classification results in four combinations or cache types. The real address caches with virtual tags for high-performance computer systems in this study are prediction-based, since index bins are generated from a small array and predictions could be false. As a result, they also discuss and evaluate real address MRU caches with real tags, and propose virtually indexed MRU caches with real tags. Each of the four cache types and MRU caches are discussed and evaluated using trace-driven simulation. The results show that a virtually indexed MRU cache with real tags is a good choice for high-performance computer systems. >

IEEE Transactions on Computers | 1995

Efficient stack simulation for set-associative virtual address caches with real tags

Ching-Farn Eric Wu; Yarsun Hsu; Yew-Huey Liu

Stack simulation is a powerful cache analysis approach to generate the number of misses and write backs for various cache configurations in a single run. Unfortunately, none of the previous work on stack simulation has efficient stack algorithm for virtual address caches with real tags (VIR-type caches). In this paper, we devise an efficient stack simulation algorithm for analyzing VIR-type caches. Using markers with a valid range for synonym lines, our algorithm is able to keep track of stack distances for different cache configurations. In addition to cache miss ratios and write back ratios, our approach generates pseudonym frequency for all cache configurations under investigation. >

IEEE Parallel & Distributed Technology: Systems & Applications | 1996

A Unified Trace Environment for IBM SP systems

Ching-Farn Eric Wu; Hubertus Franke; Yew-Huey Liu

C. Eric Wu, Hubertus Franke, and Yew-Huey Liu IBM T J. Watson Research Center Distributed parallel processing can increase system computing power beyond the limits of current uniprocessor technology. However, programming in such a system based on the message-passing programming model is much more complex than writing sequential programs. To take advantage of the underlying hardware, understanding the communication behavior of parallel programs and system responses to user applications is extremely critical. One common way of monitoring a program’s behavior is to generate trace events while executing the program. Events generated can then be used for other purposes such as debugging and program visualization. However, as we’ll see, such a method potentially requires source code modification, increases overhead, and causes clocksynchronization problems. T o meet these challenges, we developed a Unified Trace Environment for IBM SP systems. The user-level U T E trace libraries require only relinking for generating message-passing and system events. With the UTE, users can generate message-passing events with minimum overhead, and mark specific portions of the program, such as various phases, loops, and routines, for performance analysis and visualization. Most user-level trace tools for messagepassing systems require source code modification to collect message-passing events. More advanced tools such as the Paradyn systeml require no source code modification; they insert the code for performance instrumentation into an application program during execution. However, instrumentation daemons cause substantial overhead. Collecting system events is as important as collecting message-passing events. System and I/O events such as process dispatch and page fault can reveal crucial information on system responses to user applications. The trace facility should also easily expand to trace activities from other software layers, such as parallel I/O file systems and high-level parallel languages. Such expandability enables the same trace facility to trace multiple software systems. One of the most serious problems in trace analysis for distributed parallel systems is clock synchronization. In such a system, multiple processors generate trace records, and often multiple nodes produce separate streams independently. The logical order of events might not be guaranteed in the trace because of discrepancies among local clocks. As a result, many trace facilities must do additional work to ensure consistent time stamps, thus increasing trace overhead. The challenges of trace analysis

international conference on parallel and distributed systems | 1994

Trace-based analysis and tuning for distributed parallel applications

Ching-Farn Eric Wu; Yew-Huey Liu; C. Benveniste; C.-L. Chen; W.-H. Chiang

We present an integrated approach to deal with timestamp consistency, and trace based performance analysis techniques for distributed parallel applications. Our trace generation facility captures message passing and system events such as process dispatch with minimal trace overhead. Trace driven analysis tools are developed for post execution analysis, reporting information such as the time stolen by other processes in each node, and the observed message passing time and local wait time for each message. We then present our techniques to reduce total elapsed times based on observed message passing times and local wait times.

international parallel processing symposium | 1995

Timestamp consistency and trace-driven analysis for distributed parallel systems

Ching-Farn Eric Wu; Yew-Huey Liu; Yarsun Hsu

A continuous stream of event data describing the progress of parallel program execution is realized for trace-driven analysis. Unfortunately, it is often the case that separate streams are produced independently by multiple processors in the system, and the logical order of events cannot be guaranteed due to discrepancy among local clocks. We present an integrated approach for timestamp consistency and performance analysis techniques for IBM SPn systems. The trace facility requires no source code modification, and can generate message passing and system events with minimal trace overhead. Trace-driven analysis tools are developed to extract useful information. Analysis results for NAS kernel benchmarks are reported.<<ETX>>

international conference on parallel processing | 1993

Efficient Stack Simulation for Shared Memory Set-Associative Multiprocessor Caches

C. Eric Wu; Yarsun Hsu; Yew-Huey Liu

We propose efficient stack simulation algorithms for shared memory multiprocessor (MP) caches. A stack simulation algorithm for write-updated MP caches is first presented. It produces the number of write-updates as well as misses for all cache configurations in a single run. We then devise a new stack simulation algorithm for writeinvalidate MP caches. Our algorithm takes into account cross-invalidation among processors, and generates the number of invalidations as well as misses for all cache configurations in a single run. A cache simulator based on our algorithms for MP caches is developed and the results on sample traces are reported. Our results show that effi cient stack simulation is a powerful technique for multi processor cache analysis.

international conference on parallel and distributed systems | 1996

A distributed connection manager interface for web services on IBM SP systems

Yew-Huey Liu; Paul M. Dantzig; Ching-Farn Eric Wu; Lionel M. Ni

In essence, the World Wide Web is a worldwide string of computer databases using a common information retrieval architecture. With the increasing popularity of the World Wide Web, more and more functions have been added to retrieve not only documents written in HTML (Hypertext Markup Language), but also those in other forms through the Common Gateway Interface (CGI), by constructing HTML documents dynamically. Dynamic construction of HTML documents for handling information such as digital libraries is slow and requires much more computer power. A significant performance bottleneck is the initialization and setup phase for a CGI process to gain access to the system containing the data. In this paper we describe the design and implementation of a Connection Manager Interface on IBM SP systems. The Connection Manager provides cliette processes to serve CGI requests and eliminates such bottlenecks. An IBM SP system is used for this emerging area to show that our design and implementation is flexible enough to take advantage of the High-Performance Switch in an IBM SP system. We trace and monitor this scalable Web services using UTE (Unified Trace Environment) tools, and present its performance analysis and visualization.

international conference on pervasive services | 2005

Pervasive computing technologies for retail in-store shopping

Jih-Shyr Yih; Florian Pinel; Yew-Huey Liu; Trieu C. Chieu

Retailers are constantly in search for ways to enhance customer satisfaction so as to differentiate with the competition and increase revenue. This paper describes an in-store commerce server implementation that leverages pervasive computing technologies to redefine the in-store shopping experience. The server evolves the existing point-of-sale systems into a store integration platform complete with reusable in-store solution building blocks. The paper illustrates how customer touch points such as cart-mounted Web pads can be supported to enable location-sensitive, personalized shopping assistance, and incremental self-checkout. New collaborative shopping paradigms can be created by service-oriented process choreography with other sales channels.

Explore More