Rebecca J. Stones
Nankai University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rebecca J. Stones.
international acm sigir conference on research and development in information retrieval | 2016
Zhaohua Zhang; Jiancong Tong; Haibing Huang; Jin Liang; Tianlong Li; Rebecca J. Stones; Gang Wang; Xiaoguang Liu
Large-scale search engines need to answer thousands of queries per second over billions of documents, which is typically done by querying a large inverted index. Many highly optimized integer encoding techniques are applied to compress the inverted index and reduce the query processing time. In this paper, we propose a new grammar-based inverted index compression scheme, which can improve the performance of both index compression and query processing. Our approach identifies patterns (common subsequences of docIDs) among different posting lists and generates a context-free grammar to succinctly represent the inverted index. To further optimize the compression performance, we carefully redesign the index structure. Experiments show a reduction up to 8.8% in space usage while decompression is up to 14% faster. We also design an efficient list intersection algorithm which utilizes the proposed grammar-based inverted index. We show that our scheme can be combined with common docID reassignment methods and encoding techniques, and yields about 14% to 27% higher throughput for AND queries by utilizing multiple threads.
symposium on reliable distributed systems | 2016
Peng Li; Jing Li; Rebecca J. Stones; Gang Wang; Zhongwei Li; Xiaoguang Liu
Common distributed storage systems use data replication to improve system reliability and maintain data availability, but at the cost of disk storage. In order to lower storage costs, data may instead be stored according to erasure codes, but this results in greater network and disk traffic when data blocks are reconstructed following an erasure. These methods are also passive, i.e., they only reconstruct data after failures occur. In this paper, we present a proactive erasure coding scheme (ProCode). We monitor the health of disks via drive failure prediction and automatically adjust the replication factor of data blocks on at-risk disks to ensure data safety. In this way, we achieve fast recovery after disk failures without significantly increasing the storage overhead. ProCode is implemented as an extension to HDFS-RAID used by Facebook. Compared with replication storage and erasure coding, ProCode improves system reliability and availability. Specifically, experimental results show 2 or more orders of magnitude reduction in the average number of data loss events over a 10- year period, a 63% or greater drop in degraded read latency, and a 78% drop in recovery time.
Electronic Notes in Discrete Mathematics | 2015
Raúl M. Falcón; Rebecca J. Stones
Abstract Isotopisms of the set R r , s , n of r × s partial Latin rectangles based on n symbols constitute a finite group that acts on this set by permuting rows, columns and symbols. The number of partial Latin rectangles preserved by this action only depends on the conjugacy classes of these permutations. In this paper, the distribution of the isotopism group into conjugacy classes is considered in order to determine the distribution of R r , s , n into isomorphism and isotopism classes, for all r , s , n ≤ 6 .
Reliability Engineering & System Safety | 2017
Jing Li; Rebecca J. Stones; Gang Wang; Xiaoguang Liu; Zhongwei Li; Ming Xu
This paper proposes two hard drive failure prediction models based on Decision Trees (DTs) and Gradient Boosted Regression Trees (GBRTs) which perform well in prediction performance as well as stability and interpretability. The models are evaluated on a real-world dataset containing 121,698 drives in total. Experimental results show the DT model predicts over 93% of failures at a false alarm rate under 0.01%, and the GBRT model can achieve about 90% failure detection rate without any false alarms. Moreover, the GBRT model evaluates drive health (or fault probability) which provides a quantitative indicator of failure urgency. This enables operators to allocate system resources accordingly for pre-warning migrations while maintaining the quality of user services.
symposium on reliable distributed systems | 2016
Jing Li; Rebecca J. Stones; Gang Wang; Zhongwei Li; Xiaoguang Liu; Kang Xiao
Traditionally, disk failure prediction accuracy is used to evaluate disk failure prediction model. However, accuracy may not reflect their practical usage (protecting against failures, rather than only predicting failures) in cloud storage systems. In this paper, we propose two new metrics for disk failure prediction models: migration rate, which measures how much at-risk data is protected as a result of correct failure predictions, and mismigration rate, which measures how much data is migrated needlessly as a result of false failure predictions. To demonstrate their effectiveness, we compare disk failure prediction methods: (a) a classification tree (CT) model vs. a state-of-the-art recurrent neural network (RNN) model, and (b) a proposed residual life prediction model based on gradient boosted regression trees (GBRTs) vs. RNN. While prediction accuracy experiments favor the RNN model, migration rate experiments can favor the CT and GBRT models (depending on transfer rates). We conclude that prediction accuracy can be a misleading metric. Moreover, the proposed GBRT model offers a practical improvement in disk failure prediction in real-world data centers.
Discrete Mathematics | 2017
Ral M. Falcn; Rebecca J. Stones
An rs partial Latin rectangle (lij) is an rs matrix containing elements of {1,2,,n}{} such that each row and each column contain at most one copy of any symbol in {1,2,,n}. An entry is a triple (i,j,lij) with lij. Partial Latin rectangles are operated on by permuting the rows, columns, and symbols, and by uniformly permuting the coordinates of the set of entries. The stabilizers under these operations are called the autotopism group and the autoparatopism group, respectively.We develop the theory of symmetries of partial Latin rectangles, introducing the concept of a partial Latin rectangle graph. We give constructions of m-entry partial Latin rectangles with trivial autotopism groups for all possible autoparatopism groups (up to isomorphism) when: (a) r=s=n, i.e.,partial Latin squares, (b) r=2 and s=n, and (c) r=2 and sn.
international joint conference on neural network | 2016
Shuai Pang; Yuhan Jia; Rebecca J. Stones; Gang Wang; Xiaoguang Liu
Statistical and machine learning methods have been proposed to predict hard drive failure based on SMART attributes, and many achieve good performance. However, these models do not give a good indication as to when a drive will fail, only predicting that it will fail. To this end, we propose a new notion of a drives health degree based on the remaining working time of hard drive before actual failure occurs. An ensemble learning method is implemented to predict these health degrees: four popular individual classifiers are individually trained and used in a Combined Bayesian Network (CBN). Experiments show that the CBN model can give a health assessment under the proposed definition where drives are predicted to fail no later than their actual failure time 70% or more of the time, while maintaining prediction performance standards at least approximately as good as the individual classifiers.
ieee conference on mass storage systems and technologies | 2016
Jingwei Ma; Rebecca J. Stones; Yuxiang Ma; Jingui Wang; Junjie Ren; Gang Wang; Xiaoguang Liu
During data deduplication, on-disk fingerprint lookups lead to high disk traffic, resulting in a bottleneck. In this paper, we propose a “lazy” data deduplication method which buffers incoming fingerprints and performs on-disk lookups in batches, aiming to reduce the disk bottleneck. In deduplication in general, prefetching is used to improve the cache hit rate by exploiting locality within the incoming fingerprint stream. For lazy deduplication, we design a buffering strategy that preserves locality in order to similarly facilitate prefetching. Experimental results indicate that the lazy method improves fingerprint identification performance by over 50% compared with an “eager” method with the same data layout
Designs, Codes and Cryptography | 2016
Rebecca J. Stones; Ming Su; Xiaoguang Liu; Gang Wang; Sheng Lin
We present a novel secret sharing scheme where the secret is an autotopism (a symmetry) of a Latin square. Previously proposed secret sharing schemes involving Latin squares have many drawbacks: (a) Latin squares contain
Electronic Notes in Discrete Mathematics | 2018
Eiran Danan; Raúl M. Falcón; Dani Kotlar; Trent G. Marbach; Rebecca J. Stones