Yi-Cheng Tu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yi-Cheng Tu is active.

Explore More

Publication

Featured researches published by Yi-Cheng Tu.

international conference on data engineering | 2009

A Rule-Based Classification Algorithm for Uncertain Data

Biao Qin; Yuni Xia; Sunil Prabhakar; Yi-Cheng Tu

Data uncertainty is common in real-world applications due to various causes, including imprecise measurement, network latency, outdated sources and sampling errors. These kinds of uncertainty have to be handled cautiously, or else the mining results could be unreliable or even wrong. In this paper, we propose a new rule-based classification and prediction algorithm called uRule for classifying uncertain data. This algorithm introduces new measures for generating, pruning and optimizing rules. These new measures are computed considering uncertain data interval and probability distribution function. Based on the new measures, the optimal splitting attribute and splitting value can be identified and used for classification and prediction. The proposed uRule algorithm can process uncertainty in both numerical and categorical data. Our experimental results show that uRule has excellent performance even when data is highly uncertain.

international conference on data engineering | 2010

Exploring power-performance tradeoffs in database systems

Zichen Xu; Yi-Cheng Tu; Xiaorui Wang

With the total energy consumption of computing systems increasing in a steep rate, much attention has been paid to the design of energy-efficient computing systems and applications. So far, database system design has focused on improving performance of query processing. The objective of this study is to experimentally explore the potential of power conservation in relational database management systems. We hypothesize that, by modifying the query optimizer in a DBMS to take the power cost of query plans into consideration, we will be able to reduce the power usage of database servers and control the tradeoffs between power consumption and system performance. We also identify the sources of such savings by investigating the resource consumption features during query processing in DBMSs. To that end, we provide an in-depth anatomy and qualitatively analyze the power profile of typical queries in the TPC benchmarks. We perform extensive experiments on a physical testbed based on the PostgreSQL system using workloads generated from the TPC benchmarks. Our hypothesis is supported by such experimental results: power savings in the range of 11% - 22% can be achieved by equipping the DBMS with a query optimizer that selects query plans based on both estimated processing time and power requirements.1

ACM Transactions on Multimedia Computing, Communications, and Applications | 2005

An analytical study of peer-to-peer media streaming systems

Yi-Cheng Tu; Jianzhong Sun; Mohamed Hefeeda; Sunil Prabhakar

Recent research efforts have demonstrated the great potential of building cost-effective media streaming systems on top of peer-to-peer (P2P) networks. A P2P media streaming architecture can reach a large streaming capacity that is difficult to achieve in conventional server-based streaming services. Hybrid streaming systems that combine the use of dedicated streaming servers and P2P networks were proposed to build on the advantages of both paradigms. However, the dynamics of such systems and the impact of various factors on system behavior are not totally clear. In this article, we present an analytical framework to quantitatively study the features of a hybrid media streaming model. Based on this framework, we derive an equation to describe the capacity growth of a single-file streaming system. We then extend the analysis to multi-file scenarios. We also show how the system achieves optimal allocation of server bandwidth among different media objects. The unpredictable departure/failure of peers is a critical factor that affects the performance of P2P systems. We utilize the concept of peer lifespan to model peer failures. The original capacity growth equation is enhanced with coefficients generated from peer lifespans that follow an exponential distribution. We also propose a failure model under arbitrarily distributed peer lifespan. Results from large-scale simulations support our analysis.

electronic commerce | 2008

An efficient non-dominated sorting method for evolutionary algorithms

Hongbing Fang; Qian Wang; Yi-Cheng Tu; M.F. Horstemeyer

We present a new non-dominated sorting algorithm to generate the non-dominated fronts in multi-objective optimization with evolutionary algorithms, particularly the NSGA-II. The non-dominated sorting algorithm used by NSGA-II has a time complexity of O(MN2) in generating non-dominated fronts in one generation (iteration) for a population size N and M objective functions. Since generating non-dominated fronts takes the majority of total computational time (excluding the cost of fitness evaluations) of NSGA-II, making this algorithm faster will significantly improve the overall efficiency of NSGA-II and other genetic algorithms using non-dominated sorting. The new non-dominated sorting algorithm proposed in this study reduces the number of redundant comparisons existing in the algorithm of NSGA-II by recording the dominance information among solutions from their first comparisons. By utilizing a new data structure called the dominance tree and the divide-and-conquer mechanism, the new algorithm is faster than NSGA-II for different numbers of objective functions. Although the number of solution comparisons by the proposed algorithm is close to that of NSGA-II when the number of objectives becomes large, the total computational time shows that the proposed algorithm still has better efficiency because of the adoption of the dominance tree structure and the divide-and-conquer mechanism.

modeling, analysis, and simulation on computer and telecommunication systems | 2011

Fuzzy Modeling Based Resource Management for Virtualized Database Systems

Lixi Wang; Jing Xu; Ming Zhao; Yi-Cheng Tu; José A. B. Fortes

The hosting of databases on virtual machines (VMs) has great potential to improve the efficiency of resource utilization and the ease of deployment of database systems. This paper considers the problem of on-demand allocation of resources to a VM running a database serving dynamic and complex query workloads while meeting QoS (Quality of Service) requirements. An autonomic resource-management approach is proposed to address this problem. It uses adaptive fuzzy modeling to capture the behavior of a VM hosting a database with dynamically changing workloads and to predict its multi-type resource needs. A prototype of the proposed approach is implemented on Xen-based VMs and evaluated using workloads based on TPC-H and RUBiS. The results demonstrate that CPU and disk I/O bandwidth can be efficiently allocated to database VMs serving workloads with dynamically changing intensity and composition while meeting QoS targets. For TPC-H-based experiments, the resulting throughput is within 89.5 -- 100% of what would be obtained using resource allocation based on peak loads, For RUBiS, the response time target (set based on the performance under peak-load-based allocation) is met for 97% of the time. Moreover, substantial resources are saved (about 62.6% of CPU and 76.5% of disk I/O bandwidth) in comparison to peak-load-based allocation.

very large data bases | 2012

PET: reducing database energy cost via query optimization

Zichen Xu; Yi-Cheng Tu; Xiaorui Wang

Energy conservation is a growing important issue in designing modern database management system (DBMS). This requires a deep thinking about the tradeoffs between energy and performance. Despite the significant amount of efforts at the hardware level to make the major components consume less energy, we argue for a revisit of the DBMS query processing mechanism to identify and harvest the potential of energy saving. However, the state-of-art architecture of DBMS does not take energy usage into consideration in its design. A major challenge in developing an energy-aware DBMS is to design and implement a cost-based query optimizer that evaluates query plans by both performance and energy costs. By following such a strategy, our previous work revealed the fact that energy-efficient query plans do not necessarily have the shortest processing time. This demo proposal introduces PET -- an energy-aware query optimization framework that is built as a part of the PostgreSQL kernel. PET, via its power cost estimation module and plan evaluation model, enables the database system to run under a DBA-specified energy/performance tradeoff level. PET contains a power cost estimator that can accurately estimate the power cost of query plans at compile time, and a query evaluation engine that the DBA could configure key PET parameters towards the desired tradeoff. The software to be demonstrated will also include workload engine for producing large quantities of queries and data sets. Our demonstration will show how PET functions via a comprehensive set of views from its graphical user interface named PET Viewer. Through such interfaces, a user can achieve a good understanding of the energy-related query optimization and cost-based plan generation. Users are also allowed to interact with PET to experience the different energy/performance tradeoffs by changing PET and workload parameters at query runtime.

conference on information and knowledge management | 2012

Trust prediction via aggregating heterogeneous social networks

Jin Huang; Feiping Nie; Heng Huang; Yi-Cheng Tu

Along with the increasing popularity of social web sites, users rely more on the trustworthiness information for many online activities among users. However, such social network data often suffers from severe data sparsity and are not able to provide users with enough information. Therefore, trust prediction has emerged as an important topic in social network research. Traditional approaches explore the topology of trust graph. Previous research in sociology and our life experience suggest that people who are in the same social circle often exhibit similar behavior and tastes. Such ancillary information, is often accessible and therefore could potentially help the trust prediction. In this paper, we address the link prediction problem by aggregating heterogeneous social networks and propose a novel joint manifold factorization (JMF) method. Our new joint learning model explores the user group level similarity between correlated graphs and simultaneously learns the individual graph structure, therefore the shared structures and patterns from multiple social networks can be utilized to enhance the prediction tasks. As a result, we not only improve the trust prediction in the target graph, but also facilitate other information retrieval tasks in the auxiliary graphs. To optimize the objective function, we break down the proposed objective function into several manageable sub-problems, then further establish the theoretical convergence with the aid of auxiliary function. Extensive experiments were conducted on real world data sets and all empirical results demonstrated the effectiveness of our method.

international conference on management of data | 2014

A system for energy-efficient data management

Yi-Cheng Tu; Xiaorui Wang; Bo Zeng; Zichen Xu

Energy consumption of computer systems has increased at a steep rate in recent years. Following extensive energyrelated research and practice in the hardware and OS communities, much attention has been paid to developing energy-efficient applications. With database systems being a heavy energy consumer in modern data centers, we face the challenge of designing DBMSs with energy as a first-class performance goal. This paper presents our on-goingwork in designing and implementing a DBMS that enables significant energy conservations while maintaining other performance targets. We follow two new strategies in DBMS implementation to achieve our system design goal. The first one is to change the resource consumption patterns via energy-aware query optimization and reorganizing data records to enable load consolidation in disks. The second strategy is active control of power modes of hardware (i.e., CPU and hard disks) toward energy reduction. Specifically, we use control-theoretic techniques to allowdynamic adjustment of CPU frequency and online data migration to achieve disk load consolidation. Preliminary results have shown the effectiveness of our design.

international conference on distributed computing systems | 2013

Dynamic Energy Estimation of Query Plans in Database Systems

Zichen Xu; Yi-Cheng Tu; Xiaorui Wang

Data centers are well known to consume large amounts of energy. Since database is one of the major applications in a typical data center, building energy-aware database systems has become an active research topic recently. The quantification of the energy cost of database systems is an important task in designing such systems. In this paper, we report our recent efforts on this topic, with a focus on the energy cost estimation of query plans during query optimization. We start from building a series of physical models for energy estimation of individual relational operators based on their resource consumption patterns. Since the execution of individual queries is a combination of relational operators, we use the physical models as a basis for a comprehensive energy cost estimation model for entire query plans. To further improve model accuracy under system dynamics and the variations of workload characteristics, we develop an online model estimation scheme that dynamically corrects the static model based on advanced modeling techniques adopted from control engineering. The models are implemented in a real database and evaluated on a physical test bed with a comprehensive set of experimental workloads. The results show that our solution achieves a high accuracy (above 90%) in energy estimation despite noises from the system and workloads.

ACM Transactions on Knowledge Discovery From Data | 2013

Social trust prediction using heterogeneous networks

Jin Huang; Feiping Nie; Heng Huang; Yi-Cheng Tu; Yu Lei

Along with increasing popularity of social websites, online users rely more on the trustworthiness information to make decisions, extract and filter information, and tag and build connections with other users. However, such social network data often suffer from severe data sparsity and are not able to provide users with enough information. Therefore, trust prediction has emerged as an important topic in social network research. Traditional approaches are primarily based on exploring trust graph topology itself. However, research in sociology and our life experience suggest that people who are in the same social circle often exhibit similar behaviors and tastes. To take advantage of the ancillary information for trust prediction, the challenge then becomes what to transfer and how to transfer. In this article, we address this problem by aggregating heterogeneous social networks and propose a novel joint social networks mining (JSNM) method. Our new joint learning model explores the user-group-level similarity between correlated graphs and simultaneously learns the individual graph structure; therefore, the shared structures and patterns from multiple social networks can be utilized to enhance the prediction tasks. As a result, we not only improve the trust prediction in the target graph but also facilitate other information retrieval tasks in the auxiliary graphs. To optimize the proposed objective function, we use the alternative technique to break down the objective function into several manageable subproblems. We further introduce the auxiliary function to solve the optimization problems with rigorously proved convergence. The extensive experiments have been conducted on both synthetic and real- world data. All empirical results demonstrate the effectiveness of our method.

Explore More