Hailing Yu
University of California, Santa Barbara
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hailing Yu.
database and expert systems applications | 2005
Hailing Yu; Hua Gang Li; Ping Wu; Divyakant Agrawal; Amr El Abbadi
Ranking-aware queries, or top-k queries, have received much attention recently in various contexts such as web, multimedia retrieval, relational databases, and distributed systems. Top-k queries play a critical role in many decision-making related activities such as, identifying interesting objects, network monitoring, load balancing, etc. In this paper, we study the ranking aggregation problem in distributed systems. Prior research addressing this problem did not take data distributions into account, simply assuming the uniform data distribution among nodes, which is not realistic for real data sets and is, in general, inefficient. In this paper, we propose three efficient algorithms that consider data distributions in different ways. Our extensive experiments demonstrate the advantages of our approaches in terms of bandwidth consumption.
very large data bases | 2003
Hailing Yu; Divyakant Agrawal; Amr El Abbadi
Due to the advances in semiconductor manufacturing, the gap between main memory and secondary storage is constantly increasing. This becomes a significant performance bottleneck for Database Management Systems, which rely on secondary storage heavily to store large datasets. Recent advances in nanotechnology have led to the invention of alternative means for persistent storage. In particular, MicroElectroMechanical Systems (MEMS) based storage technology has emerged as the leading candidate for next generation storage systems. In order to integrate MEMS-based storage into conventional computing platform, new techniques are needed for I/O scheduling and data placement. In the context of relational data, it has been observed that access to relations needs to be enabled in both row-wise as well as in columnwise fashions. In this paper, we exploit the physical characteristics of MEMS-based storage devices to develop a data placement scheme for relational data that enables retrieval in both row-wise and column-wise manner. We demonstrate that this data layout not only improves I/O utilization, but results in better cache performance.
ieee conference on mass storage systems and technologies | 2003
Hailing Yu; Divyakant Agrawal; A. El Abbadi
MEMS-based storage devices are currently being developed to narrow the gap between processor and disk speeds. MEMS-based storage devices have a different architecture from disk devices, thus algorithms, such as I/O scheduling and data placement, designed for disks need to be revisited. In this paper we focus on developing an I/O scheduling algorithm for MEMS-based storage devices. Our theoretical analysis shows that this algorithm is guaranteed to perform within twice the optimal time for any workload.
Distributed and Parallel Databases | 2006
Hailing Yu; Divyakant Agrawal; Amr El Abbadi
Due to the large difference between seek time and transfer time in current disk technology, it is advantageous to perform large I/O using a single sequential access rather than multiple small random I/O accesses. However, prior optimal cost and data placement approaches for processing range queries over two-dimensional datasets do not consider this property. In particular, these techniques do not consider the issue of sequential data placement when multiple I/O blocks need to be retrieved from a single device. In this paper, we reevaluate the optimal cost of range queries by declustering two-dimensional datasets over multiple devices, and prove that, in general, it is impossible to achieve the new optimal cost. This is because disks cannot facilitate two-dimensional sequential access which is required by the new optimal cost. Then we revisit the existing data allocation schemes under the new optimal cost, and show that none of them can achieve the new optimal cost. Fortunately, MEMS-based storage is being developed to reduce I/O cost. We first show that the two-dimensional sequential access requirement can not be satisfied by simply modeling MEMS-based storage as conventional disks. Then we propose a new placement scheme that exploits the physical properties of MEMS-based storage to solve this problem. Our theoretical analysis and experimental results show that the new scheme achieves almost optimal I/O costs.
data and knowledge engineering | 2007
Hua Gang Li; Hailing Yu; Divyakant Agrawal; Amr El Abbadi
Ranking-aware queries have been gaining much attention recently in many applications such as multimedia databases, search engines and data streams. They are, however, not only restricted to such applications but are also very useful in On-Line Analytical Processing (OLAP) applications. In this paper, we introduce aggregation ranking queries in OLAP data cubes motivated by an online advertisement tracking data warehouse application. These queries aggregate information over a specified range and then return the ranked order of the aggregated values. For instance, an advertiser might be interested in the top-k publishers over the last three months in terms of sales obtained through the online advertisements placed on the publishers. They differ from range aggregate queries in that range aggregate queries are mainly concerned with an aggregate operator such as SUM and MIN/MAX over the selected ranges of all dimensions in the data cubes. Existing techniques for range aggregate queries are not able to process aggregation ranking queries efficiently. Hence, in this paper we propose new algorithms to handle this problem. The essence of the proposed algorithms is based on both ranking and cumulative information to progressively rank aggregation results. Furthermore we empirically evaluate our techniques and the experimental results show that the query cost is improved significantly.
extending database technology | 2004
Hailing Yu; Divyakant Agrawal; Amr El Abbadi
Due to the large difference between seek time and transfer time in current disk technology, it is advantageous to perform large I/O using a single sequential access rather than multiple small random I/O accesses. However, prior optimal cost and data placement approaches for processing range queries over two-dimensional datasets do not consider this property. In particular, these techniques do not consider the issue of sequential data placement when multiple I/O blocks need to be retrieved from a single device. In this paper, we reevaluate the optimal cost of range queries by declustering two-dimensional datasets over multiple devices, and prove that, in general, it is impossible to achieve the new optimal cost. This is because disks cannot facilitate two-dimensional sequential access which is required by the new optimal cost. Fortunately, MEMS-based storage is being developed to reduce I/O cost. We first show that the two-dimensional sequential access requirement can not be satisfied by simply modeling MEMS-based storage as conventional disks. Then we propose a new placement scheme that exploits the physical properties of MEMS-based storage to solve this problem. Our theoretical analysis and experimental results show that the new scheme achieves almost optimal results.
very large data bases | 2007
Hailing Yu; Divyakant Agrawal; Amr El Abbadi
Due to recent advances in semiconductor manufacturing, the gap between main memory and disks is constantly increasing. This leads to a significant performance bottleneck for Relational Database Management Systems. Recent advances in nanotechnology have led to the invention of MicroElectroMechanical Systems (MEMS) based storage technology to replace disks. In this paper, we exploit the physical characteristics of MEMS-based storage devices to develop a placement scheme for relational data that enables retrieval in both row-wise and column-wise manner. We develop algorithms for different relational operations based on this data layout. Our experimental results and analysis demonstrate that this data layout not only improves I/O utilization, but results in better cache performance for a variety of different relational operations.
Lecture Notes in Computer Science | 2005
Hailing Yu; Hua-Gang Li; Ping Wu; Divyakant Agrawal; Amr El Abbadi
IEEE Data(base) Engineering Bulletin | 2005
Nagender Bandi; Chengyu Sun; Hailing Yu; Divyakant Agrawal; Amr El Abbadi
Architecture conscious information management systems | 2005
Hailing Yu; Divyakant Agrawal; Amr El Abbadi