Tianlei Hu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tianlei Hu is active.

Explore More

Publication

Featured researches published by Tianlei Hu.

IEEE Transactions on Knowledge and Data Engineering | 2014

BestPeer++: A Peer-to-Peer BasedLarge-Scale Data Processing Platform

Gang Chen; Tianlei Hu; Dawei Jiang; Peng Lu; Kian-Lee Tan; Hoang Tam Vo; Sai Wu

The corporate network is often used for sharing information among the participating companies and facilitating collaboration in a certain industry sector where companies share a common interest. It can effectively help the companies to reduce their operational costs and increase the revenues. However, the inter-company data sharing and processing poses unique challenges to such a data management system including scalability, performance, throughput, and security. In this paper, we present BestPeer++, a system which delivers elastic data sharing services for corporate network applications in the cloud based on BestPeer—a peer-to-peer (P2P) based data management platform. By integrating cloud computing, database, and P2P technologies into one system, BestPeer++ provides an economical, flexible and scalable platform for corporate network applications and delivers data sharing services to participants based on the widely accepted pay-as-you-go business model. We evaluate BestPeer++ on Amazon EC2 Cloud platform. The benchmarking results show that BestPeer++ outperforms HadoopDB, a recently proposed large-scale data processing system, in performance when both systems are employed to handle typical corporate network workloads. The benchmarking results also demonstrate that BestPeer++ achieves near linear scalability for throughput with respect to the number of peer nodes.

international conference on data engineering | 2012

BestPeer++: A Peer-to-Peer Based Large-Scale Data Processing Platform

Gang Chen; Tianlei Hu; Dawei Jiang; Peng Lu; Kian-Lee Tan; Hoang Tam Vo; Sai Wu

The corporate network is often used for sharing information among the participating companies and facilitating collaboration in a certain industry sector where companies share a common interest. It can effectively help the companies to reduce their operational costs and increase the revenues. However, the inter-company data sharing and processing poses unique challenges to such a data management system including scalability, performance, throughput, and security. In this paper, we present BestPeer++, a system which delivers elastic data sharing services for corporate network applications in the cloud based on BestPeer - a peer-to-peer (P2P) based data management platform. By integrating cloud computing, database, and P2P technologies into one system, BestPeer++ provides an economical, flexible and scalable platform for corporate network applications and delivers data sharing services to participants based on the widely accepted pay-as-you-go business model. We evaluate BestPeer++ on Amazon EC2 Cloud platform. The benchmarking results show that BestPeer++ outperforms HadoopDB, a recently proposed large-scale data processing system, in performance when both systems are employed to handle typical corporate network workloads. The benchmarking results also demonstrate that BestPeer++ achieves near linear scalability for throughput with respect to the number of peer nodes.

conference on information and knowledge management | 2010

(k,P)-anonymity: towards pattern-preserving anonymity of time-series data

Xuan Shang; Ke Chen; Lidan Shou; Gang Chen; Tianlei Hu

The challenges with privacy protection of time series are mainly due to the complex nature of the data and the queries performed on them. We study the anonymization of time series while trying to support complex queries, such as range and pattern similarity queries, on the published data. The conventional k-anonymity cannot effectively address this problem as it may suffer severe pattern loss. We propose a novel anonymization model called (k,P)-anonymity for pattern-rich time series. This model publishes both the attribute values and the patterns of time series in separate data forms. We demonstrate that our model can prevent linkage attacks on the published data while effectively support a wide variety of queries on the anonymized data. We also design an efficient algorithm for enforcing (k,P)-anonymity on time series data.

cyberworlds | 2008

Query Triggered Crawling Strategy: Build a Time Sensitive Vertical Search Engine

Yu Wu; Lidan Shou; Tianlei Hu; Gang Chen

In todays information society, it is important to retrieve fresh information. Many of the vertical search results are valid to users in only a short period of time. But due to resource constraints, it is not possible to keep the entire local storage synchronized with the Web. We implemented a TSVS (time sensitive vertical search engine) prototype named Velocisaurus focused on time-critical airfare discount information search to investigate the time critical requirements of vertical search and proposed a QTC (query triggered crawling) strategy to coordinate the crawling systems by real-time user queries and solve this problem. Experiment shows that QTC driven crawlers significantly improves the freshness of the search results and utilizes the resources more efficiently compared to regular search engine crawlers..

database systems for advanced applications | 2010

Update migration: an efficient B+ tree for flash storage

Chang Xu; Lidan Shou; Gang Chen; Cheng Yan; Tianlei Hu

More and more evidence indicates that flash storage is a potential substitute for magnetic disk in the foreseeable future. Due to the high-speed random reads, flash storage could improve the performance of DBMS significantly in OLTP applications. However, previous research has shown that small-to-moderate random overwrites on flash are particularly expensive, which implies that the conventional DBMS is not ready to run on the flash storage. In this paper, we propose the design of a variant of B+ tree for flash storage, namely the Update-Migration B+ tree. In the UM-B+ tree, small quantity of updates will be migrated, rather than being executed directly, to its parent node in the form of update records when a dirty node is evicted from main memory. Further accesses to the child node will cause the update records stored in the parent node to be executed when reading the child node from the permanent storage (flash). We propose the detailed structure and operations of UM-B+ tree. We also discuss expanding the UM-B+ tree to the transaction system based on the Aries/IM. Experiments confirm that our proposed UM-B+ tree significantly reduces the random overwrites of B+ tree in a typical OLTP workloads, therefore securing a significant performance improvement on flash storage.

database and expert systems applications | 2010

Towards efficient concurrent scans on flash disks

Chang Xu; Lidan Shou; Gang Chen; Wei Hu; Tianlei Hu; Ke Chen

Flash disk, also known as Solid State Disk (SSD), is widely considered by the database community as a next-generation storage media which will completely or to a large extent replace magnetic disk in data-intensive applications. However, the vast differences on the I/O characteristics between SSD and magnetic disk imply that a considerable part of the existing database techniques need to be modified to gain the best efficiency on flash storage. In this paper, we study the problem of large-scale concurrent disk scans which are frequently used in the decision support systems. We demonstrate that the conventional sharing techniques of mutiple concurrent scans are not suitable for SSDs as they are designed to exploit the characteristics of hard disk drivers (HDD). To leverage the fast random reads on SSD, we propose a new framework named Semi-Sharing Scan (S3) in this paper. S3 shares the readings between scans of similar speeds to save the bandwidth utilization. Meanwhile, it compensates the faster scans by executing random I/O requests separately, in order to reduce single scan latency. By utilizing techniques called group splitting and I/O scheduling, S3 aims at achieving good performance for concurrent scans on various workloads. We implement the S3 framework on a PostgreSQL database deployed on an enterprise SSD drive. Experiments demonstrate that S3 outperforms the conventional schemes in both bandwidth utilization and single scan latency.

international world wide web conferences | 2008

Pivotbrowser: a tag-space image searching prototype

Xiaoyan Li; Lidan Shou; Gang Chen; Xiaolong Zhang; Tianlei Hu; Jinxiang Dong

We propose a novel iterative searching and refining prototype for tagged images. This prototype, named PivotBrowser, captures semantically similar tag sets in a structure called pivot. By constructing a pivot for a textual query, PivotBrowser first selects candidate images possibly relevant to the query. The tags contained in these candidate images are then selected in terms of their tag relevances to the pivot. The shortlisted tags are clustered and one of the tag clusters is used to select the results from the candidate images. Ranking of the images in each partition is based on their relevance to the tag cluster. With the guidance of the tag clusters presented, a user is able to perform searching and iterative query refinement.

Archive | 2011