Myeongjae Jeon
Rice University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Myeongjae Jeon.
Journal of Parallel and Distributed Computing | 2011
Hyun-Gul Roh; Myeongjae Jeon; Jin-Soo Kim; Joonwon Lee
For distributed applications requiring collaboration, responsive and transparent interactivity is highly desired. Though such interactivity can be achieved with optimistic replication, maintaining replica consistency is difficult. To support efficient implementations of collaborative applications, this paper extends a few representative abstract data types (ADTs), such as arrays, hash tables, and growable arrays (or linked lists), into replicated abstract data types (RADTs). In RADTs, a shared ADT is replicated and modified with optimistic operations. Operation commutativity and precedence transitivity are two principles enabling RADTs to maintain consistency despite different execution orders. Especially, replicated growable arrays (RGAs) support insertion/deletion/update operations. Over previous approaches to the optimistic insertion and deletion, RGAs show significant improvement in performance, scalability, and reliability.
IEEE Transactions on Computers | 2011
Jae-Wan Jang; Myeongjae Jeon; Hyo-Sil Kim; Heeseung Jo; Jin-Soo Kim; Seungryoul Maeng
Increasing energy consumption in server consolidation environments leads to high maintenance costs for data centers. Main memory, no less than processor, is a major energy consumer in this environment. This paper proposes a technique for reducing memory energy consumption using virtual machine scheduling in multicore systems. We devise several heuristic scheduling algorithms by using a memory power simulator, which we designed and implemented. We also implement the biggest cover set first (BCSF) scheduling algorithm in the working server system. Through extensive simulation and implementation experiments, we observe the effectiveness of the memory-aware virtual machine scheduling in saving memory energy. In addition, we find out that power-aware memory management is essential to reduce the memory energy consumption.
international acm sigir conference on research and development in information retrieval | 2014
Myeongjae Jeon; Saehoon Kim; Seung-won Hwang; Yuxiong He; Sameh Elnikety; Alan L. Cox; Scott Rixner
Web search engines are optimized to reduce the high-percentile response time to consistently provide fast responses to almost all user queries. This is a challenging task because the query workload exhibits large variability, consisting of many short-running queries and a few long-running queries that significantly impact the high-percentile response time. With modern multicore servers, parallelizing the processing of an individual query is a promising solution to reduce query execution time, but it gives limited benefits compared to sequential execution since most queries see little or no speedup when parallelized. The root of this problem is that short-running queries, which dominate the workload, do not benefit from parallelization. They incur a large parallelization overhead, taking scarce resources from long-running queries. On the other hand, parallelization substantially reduces the execution time of long-running queries with low overhead and high parallelization efficiency. Motivated by these observations, we propose a predictive parallelization framework with two parts: (1) predicting long-running queries, and (2) selectively parallelizing them. For the first part, prediction should be accurate and efficient. For accuracy, we study a comprehensive feature set covering both term features (reflecting dynamic pruning efficiency) and query features (reflecting query complexity). For efficiency, to keep overhead low, we avoid expensive features that have excessive requirements such as large memory footprints. For the second part, we use the predicted query execution time to parallelize long-running queries and process short-running queries sequentially. We implement and evaluate the predictive parallelization framework in Microsoft Bing search. Our measurements show that under moderate to heavy load, the predictive strategy reduces the 99th-percentile response time by 50% (from 200 ms to 100 ms) compared with prior approaches that parallelize all queries.
european conference on computer systems | 2013
Myeongjae Jeon; Yuxiong He; Sameh Elnikety; Alan L. Cox; Scott Rixner
A web search query made to Microsoft Bing is currently parallelized by distributing the query processing across many servers. Within each of these servers, the query is, however, processed sequentially. Although each server may be processing multiple queries concurrently, with modern multicore servers, parallelizing the processing of an individual query within the server may nonetheless improve the users experience by reducing the response time. In this paper, we describe the issues that make the parallelization of an individual query within a server challenging, and we present a parallelization approach that effectively addresses these challenges. Since each server may be processing multiple queries concurrently, we also present a adaptive resource management algorithm that chooses the degree of parallelism at run-time for each query, taking into account system load and parallelization efficiency. As a result, the servers now execute queries with a high degree of parallelism at low loads, gracefully reduce the degree of parallelism with increased load, and choose sequential execution under high load. We have implemented our parallelization approach and adaptive resource management algorithm in Bing servers and evaluated them experimentally with production workloads. The experimental results show that the mean and 95th-percentile response times for queries are reduced by more than 50% under light or moderate load. Moreover, under high load where parallelization adversely degrades the system performance, the response times are kept the same as when queries are executed sequentially. In all cases, we observe no degradation in the relevance of the search results.
ACM Transactions on The Web | 2012
Myeongjae Jeon; Young-Jae Kim; Jeaho Hwang; Joonwon Lee; Euiseong Seo
With the ever-increasing popularity of Social Network Services (SNSs), an understanding of the characteristics of these services and their effects on the behavior of their host servers is critical. However, there has been a lack of research on the workload characterization of servers running SNS applications such as blog services. To fill this void, we empirically characterized real-world Web server logs collected from one of the largest South Korean blog hosting sites for 12 consecutive days. The logs consist of more than 96 million HTTP requests and 4.7TB of network traffic. Our analysis reveals the following: (i) The transfer size of nonmultimedia files and blog articles can be modeled using a truncated Pareto distribution and a log-normal distribution, respectively; (ii) user access for blog articles does not show temporal locality, but is strongly biased towards those posted with image or audio files. We additionally discuss the potential performance improvement through clustering of small files on a blog page into contiguous disk blocks, which benefits from the observed file access patterns. Trace-driven simulations show that, on average, the suggested approach achieves 60.6% better system throughput and reduces the processing time for file access by 30.8% compared to the best performance of the Ext4 filesystem.
architectural support for programming languages and operating systems | 2016
Myeongjae Jeon; Yuxiong He; Hwanju Kim; Sameh Elnikety; Scott Rixner; Alan L. Cox
In interactive services such as web search, recommendations, games and finance, reducing the tail latency is crucial to provide fast response to every user. Using web search as a driving example, we systematically characterize interactive workload to identify the opportunities and challenges for reducing tail latency. We find that the workload consists of mainly short requests that do not benefit from parallelism, and a few long requests which significantly impact the tail but exhibit high parallelism speedup. This motivates estimating request execution time, using a predictor, to identify long requests and to parallelize them. Prediction, however, is not perfect; a long request mispredicted as short is likely to contribute to the server tail latency, setting a ceiling on the achievable tail latency. We propose TPC, an approach that combines prediction information judiciously with dynamic correction for inaccurate prediction. Dynamic correction increases parallelism to accelerate a long request that is mispredicted as short. TPC carefully selects the appropriate target latencies based on system load and parallelism efficiency to reduce tail latency. We implement TPC and several prior approaches to compare them experimentally on a single search server and on a cluster of 40 search servers. The experimental results show that TPC reduces the 99th- and 99.9th-percentile latency by up to 40% compared with the best prior work. Moreover, we evaluate TPC on a finance server, demonstrating its effectiveness on reducing tail latency of interactive services beyond web search.
ACM Transactions on Architecture and Code Optimization | 2013
Myeongjae Jeon; Conglong Li; Alan L. Cox; Scott Rixner
This article describes and evaluates a new approach to optimizing DRAM performance and energy consumption that is based on eagerly writing dirty cache lines to DRAM. Under this approach, many dirty cache lines are written to DRAM before they are evicted. In particular, dirty cache lines that have not been recently accessed are eagerly written to DRAM when the corresponding row has been activated by an ordinary, noneager access, such as a read. This approach enables clustering of reads and writes that target the same row, resulting in a significant reduction in row activations. Specifically, for a variety of applications, it reduces the number of DRAM row activations by an average of 42% and a maximum of 82%. Moreover, the results from a full-system simulator show compelling performance improvements and energy consumption reductions. Out of 23 applications, 6 have overall performance improvements between 10% and 20%, and 3 have improvements in excess of 20%. Furthermore, 12 consume between 10% and 20% less DRAM energy, and 7 have energy consumption reductions in excess of 20%.
international conference on social computing | 2010
Myeongjae Jeon; Jeaho Hwang; Young-Jae Kim; Jae-Wan Jang; Joonwon Lee; Euiseong Seo
Despite the growing popularity of Online Social Networks (OSNs), the workload characteristics of OSN servers, such as those hosting blog services, are not well understood. Understanding workload characteristics is important for opti- mizing and improving the performance of current systems and software based on observed trends. Thus, in this paper, we characterize the system workload of the largest blog hosting servers in South Korea, Tistory1. In addition to understanding the system workload of the blog hosting server, we have developed synthesized workloads and obtained the following major findings: (i) the transfer size of non-multimedia files and blog articles can be modeled by a truncated Pareto distribution and a log-normal distribution respectively, and (ii) users accesses to blog articles do not show temporal locality, but they are strongly biased toward those posted along with images or audio.
measurement and modeling of computer systems | 2016
Iyswarya Narayanan; Di Wang; Myeongjae Jeon; Bikash Sharma; Laura Marie Caulfield; Anand Sivasubramaniam; Ben Cutler; Jie Liu; Badriddine Khessib; Kushagra Vaid
acm international conference on systems and storage | 2016
Iyswarya Narayanan; Di Wang; Myeongjae Jeon; Bikash Sharma; Laura Marie Caulfield; Anand Sivasubramaniam; Ben Cutler; Jie Liu; Badriddine Khessib; Kushagra Vaid