Ming-Chang Lee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ming-Chang Lee is active.

Explore More

Publication

Featured researches published by Ming-Chang Lee.

fundamental approaches to software engineering | 2016

ABS-YARN: A Formal Framework for Modeling Hadoop YARN Clusters

Jia-Chun Lin; Ingrid Chieh Yu; Einar Broch Johnsen; Ming-Chang Lee

In cloud computing, software which does not flexibly adapt to deployment decisions either wastes operational resources or requires reengineering, both of which may significantly increase costs. However, this could be avoided by analyzing deployment decisions already during the design phase of the software development. Real-Time ABS is a formal language for executable modeling of deployed virtualized software. Using Real-Time ABS, this paper develops a generic framework called ABS-YARN for YARN, which is the next generation of the Hadoop cloud computing platform with a state-of-the-art resource negotiator. We show how ABS-YARN can be used for prototyping YARN and for modeling job execution, allowing users to rapidly make deployment decisions at the modeling level and reduce unnecessary costs. To validate the modeling framework, we show strong correlations between our model-based analyses and a real YARN cluster in different scenarios with benchmarks.

advanced information networking and applications | 2013

Deriving Job Completion Reliability and Job Energy Consumption for a General MapReduce Infrastructure from Single-Job Perspective

Jia-Chun Lin; Fang-Yie Leu; Ming-Chang Lee; Ying-ping Chen

MapReduce as a master-slave infrastructure consists of two master-side servers and a large number of slave-side working nodes. In this paper, we derive a job completion reliability (JCR for short) model and a job energy consumption (JEC for short) model from a single-job perspective for a general MapReduce infrastructure in which no redundancy scheme is adopted on the master side, and a cold-standby scheme is employed on the slave side. Without loss of generality, the JCR model is derived based on a Poisson distribution. Through the simulation and analytical results, MapReduce managers and service providers can comprehend how this infrastructure behaves and how to improve the infrastructure so as to achieve a more reliable and energy-efficient MapReduce environment.

IEEE Transactions on Parallel and Distributed Systems | 2016

Hybrid Job-Driven Scheduling for Virtual MapReduce Clusters

Ming-Chang Lee; Jia-Chun Lin; Ramin Yahyapour

It is cost-efficient for a tenant with a limited budget to establish a virtual MapReduce cluster by renting multiple virtual private servers (VPSs) from a VPS provider. To provide an appropriate scheduling scheme for this type of computing environment, we propose in this paper a hybrid job-driven scheduling scheme (JoSS for short) from a tenants perspective. JoSS provides not only job-level scheduling, but also map-task level scheduling and reduce-task level scheduling. JoSS classifies MapReduce jobs based on job scale and job type and designs an appropriate scheduling policy to schedule each class of jobs. The goal is to improve data locality for both map tasks and reduce tasks, avoid job starvation, and improve job execution performance. Two variations of JoSS are further introduced to separately achieve a better map-data locality and a faster task assignment. We conduct extensive experiments to evaluate and compare the two variations with current scheduling algorithms supported by Hadoop. The results show that the two variations outperform the other tested algorithms in terms of map-data locality, reduce-data locality, and network overhead without incurring significant overhead. In addition, the two variations are separately suitable for different MapReduce-workload scenarios and provide the best job performance among all tested algorithms.

Concurrency and Computation: Practice and Experience | 2016

Performance evaluation of job schedulers on Hadoop YARN

Jia-Chun Lin; Ming-Chang Lee

To solve the limitation of Hadoop on scalability, resource sharing, and application support, the open‐source community proposes the next generation of Hadoops compute platform called Yet Another Resource Negotiator (YARN) by separating resource management functions from the programming model. This separation enables various application types to run on YARN in parallel. To achieve fair resource sharing and high resource utilization, YARN provides the capacity scheduler and the fair scheduler. However, the performance impacts of the two schedulers are not clear when mixed applications run on a YARN cluster. Therefore, in this paper, we study four scheduling‐policy combinations (SPCs for short) derived from the two schedulers and then evaluate the four SPCs in extensive scenarios, which consider not only four application types, but also three different queue structures for organizing applications. The experimental results enable YARN managers to comprehend the influences of different SPCs and different queue structures on mixed applications. The results also help them to select a proper SPC and an appropriate queue structure to achieve better application execution performance. Copyright

innovative mobile and internet services in ubiquitous computing | 2013

TSR: Topology Reduction from Tree to Star Data Grids

Ming-Chang Lee; Fang-Yie Leu; Ying-ping Chen

To speed up data transmission of data grids, several co-allocation schemes have been proposed. However, data grids are often large in scale, heterogeneous in participating resources, and complicated in architecture and network topology, consequently increasing the analytical complexity of its data transmission behaviour. In other words, if we can reduce the data transmission topology for the grid, the analysis will be easier. Therefore, in this paper, we propose a topology reduction approach, called the Tree-to-Star Reduction method (TSR for short), which can reduce a packet delivery tree topology to a star for a data grid so that the data transmission of a co-allocation scheme can be more conveniently analyzed. Here, a delivery tree topology, as a tree topology rooted at the destination node, is a network topology for delivering all fragments of a file to the destination node.

broadband and wireless computing, communication and applications | 2011

Improving Data Grids Performance by Using Popular File Replicate First Algorithm

Fang-Yie Leu; Ming-Chang Lee; Jia-Chun Lin

In this paper, we propose an adaptive data replication algorithm, called the Popular File Replicate First algorithm (PFRF for short), which is developed on a star-topology data grid with limited storage space based on aggregated information on previous file accesses. The PFRF periodically calculates file access popularity to track the variation of users¡¦ access behaviour behaviors, and then replicates popular files to appropriate sites to adapt to the variation. We employ several types of file access behaviors, including Zipf-like, geometric, and uniform distributions, to evaluate PFRF. The simulation results show that PFRF can effectively improve average job turnaround time and data availability as compared with those of the tested algorithms.

World Wide Web | 2015

Pareto-based cache replacement for YouTube

Ming-Chang Lee; Fang-Yie Leu; Ying-ping Chen

Recently, YouTube, which plays diverse video programs for worldwide users, has been one of the most attractive social-networking systems. YouTube employs a distributed memory caching system called Memcached to cache videos, and utilizes the Least Recently Used algorithm (LRU for short) to evict the least recently watched video when Memcached runs out of space. However, LRU may cause a high miss count, which is the number of times that a video requested by users cannot be found in Memcached. This might not only increase network overhead, but also cause a poor service quality for YouTube since those videos need to be retrieved from the remote back-end database. To solve these problems, in this paper, we classify videos into popular and unpopular videos and propose two cache replacement algorithms based on the Pareto principle. One is Pareto-based Least Frequently Used algorithm (PLFU for short), and the other is Pareto-based Least Recently Used algorithm (PLRU for short). The two algorithms always keep several top popular videos of each video category in Memcached to reduce miss count. However, when Memcached has insufficient space to hold a video requested by a user, PLFU and PLRU repeatedly evicts an unpopular video from Memcached based on LFU and LRU so as to hold the video. Our simulation results based on a real-world YouTube trace show that PLFU performs the best among all tested algorithms in terms of miss count and video-retrieval time. The results also indicate that when PLRU is used for a longer time, it provides the second best performance.

international conference on big data | 2014

Scheduling MapReduce tasks on virtual MapReduce clusters from a tenant's perspective

Jia-Chun Lin; Ming-Chang Lee; Ramin Yahyapour

Renting a set of virtual private servers (VPSs for short) from a VPS provider to establish a virtual MapReduce cluster is cost-efficient for a company/organization. To shorten job turnaround time and keep data locality as high as possible in this type of environment, this paper proposes a Best-Fit Task Scheduling scheme (BFTS for short) from a tenants perspective. BFTS schedules each map task to a VPS that can finish the task earlier than the other VPSs by predicting and comparing the time required by every VPS to retrieve the map-input data, execute the map task, and become idle in an online manner. Furthermore, BFTS schedules each reduce task to a VPS that is close to most VPSs that execute the related map tasks. We conduct extensive experiments to compare BFTS with several scheduling algorithms employed by Hadoop. The experimental results show that BFTS is better than the other tested algorithms in terms of map-data locality, reduce-data locality, and job turnaround time. The overhead incurred by BFTS is also evaluated, which is inevitable but acceptable compared with the other algorithms.

intelligent networking and collaborative systems | 2014

Developing a Performance-Analysis Model for a Star-Topology Data Grid from Multi-user Perspective

Ming-Chang Lee; Fang-Yie Leu; Ying-ping Chen

Data grids often integrate many geographically dispersed storage and computational resources to provide users with an environment in which scientists and scientific researchers can execute their large-scale applications and store their data collected from other databases or produced by their experiments. To efficiently transfer a huge volume of data shared by users among the clusters or nodes in data grids, several co-allocation schemes have been proposed to shorten the data delivery. In this paper, we develop a performance-analysis model, named the Response Time Evaluation Model (RTEM for short), over a topology reduced from a tree to a star to evaluate data grid co-allocation schemes from multi-user perspective.

computer software and applications conference | 2015

ReMBF: A Reliable Multicast Brute-Force Co-allocation Scheme for Multi-user Data Grids

Ming-Chang Lee; Fang-Yie Leu; Ying-ping Chen

In this paper we propose a novel co-allocation scheme, called a Reliable Multicast Brute-Force co-allocation scheme (ReMBF for short), which employs a reliable multicast (RM for short) technique with the Brute-Force (BF for short) scheme to accelerate data retrieval and delivery, and reliably transmit data to its users for data grids. Several types of data access patterns, including Zipf-like, geometric, and uniform distributions, are utilized to model user access behaviors and evaluate the performance of ReMBF. The simulation results demonstrate that ReMBF can efficiently deliver a bulk of data in a shorter time period compared with two state-of-the-art schemes.

Explore More