Cristian Ungureanu
Princeton University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Cristian Ungureanu.
ACM Transactions on Storage | 2012
Hyojun Kim; Nitin Agrawal; Cristian Ungureanu
Conventional wisdom holds that storage is not a big contributor to application performance on mobile devices. Flash storage (the type most commonly used today) draws little power, and its performance is thought to exceed that of the network subsystem. In this article, we present evidence that storage performance does indeed affect the performance of several common applications such as Web browsing, maps, application install, email, and Facebook. For several Android smartphones, we find that just by varying the underlying flash storage, performance over WiFi can typically vary between 100% and 300% across applications; in one extreme scenario, the variation jumped to over 2000%. With a faster network (set up over USB), the performance variation rose even further. We identify the reasons for the strong correlation between storage and application performance to be a combination of poor flash device performance, random I/O from application databases, and heavy-handed use of synchronous writes. Based on our findings, we implement and evaluate a set of pilot solutions to address the storage performance deficiencies in smartphones.
international conference on autonomic computing | 2005
Guofei Jiang; Haifeng Chen; Cristian Ungureanu; Kenji Yoshihira
Detection and diagnosis of faults in a large-scale distributed system is a formidable task. Interest in monitoring and using traces of user requests for fault detection has been on the rise recently. In this paper we propose novel fault detection methods based on abnormal trace detection. One essential problem is how to represent the large amount of training trace data compactly as an oracle. Our key contribution is the novel use of varied-length n-grams and automata to characterize normal traces. A new trace is compared against the learned automata to determine whether it is abnormal. We develop algorithms to automatically extract n-grams and construct multiresolution automata from training data. Further both deterministic and multihypothesis algorithms are proposed for detection. We inspect the trace constraints of real application software and verify the existence of long n-grams. Our approach is tested in a real system with injected faults and achieves good results in experiments
knowledge discovery and data mining | 2005
Haifeng Chen; Guofei Jiang; Cristian Ungureanu; Kenji Yoshihira
The increasing complexity of todays systems makes fast and accurate failure detection essential for their use in mission-critical applications. Various monitoring methods provide a large amount of data about systems behavior. Analyzing this data with advanced statistical methods holds the promise of not only detecting the errors faster, but also detecting errors which are difficult to catch with current monitoring tools. Two challenges to building such detection tools are: the high dimensionality of observation data, which makes the models expensive to apply, and frequent system changes, which make the models expensive to update. In this paper, we present algorithms to reduce the dimensionality of data in a way that makes it easy to adapt to system changes. We decompose the observation data into signal and noise subspaces. Two statistics, the Hotelling T2 score and squared prediction error (SPE) are calculated to represent the data characteristics in signal and noise subspaces respectively. Instead of tracking the original data, we use a sequentially discounting expectation maximization (SDEM) algorithm to learn the distribution of the two extracted statistics. A failure event can then be detected based on the abnormal change of the distribution. Applying our technique to component interaction data in a simple e-commerce application shows better accuracy than building independent profiles for each component. Additionally, experiments on synthetic data show that the detection accuracy is high even for changing systems.
european conference on computer systems | 2015
Hao Li; Asim Kadav; Erik Kruus; Cristian Ungureanu
Machine learning methods, such as SVM and neural networks, often improve their accuracy by using models with more parameters trained on large numbers of examples. Building such models on a single machine is often impractical because of the large amount of computation required. We introduce MALT, a machine learning library that integrates with existing machine learning software and provides data parallel machine learning. MALT provides abstractions for fine-grained in-memory updates using one-sided RDMA, limiting data movement costs during incremental model updates. MALT allows machine learning developers to specify the dataflow and apply communication and representation optimizations. Through its general-purpose API, MALT can be used to provide data-parallelism to existing ML applications written in C++ and Lua and based on SVM, matrix factorization and neural networks. In our results, we show MALT provides fault tolerance, network efficiency and speedup to these applications.
networking systems and applications for mobile handhelds | 2011
Hyojun Kim; Nitin Agrawal; Cristian Ungureanu
Conventional wisdom holds that storage is not a big contributor to application performance or energy consumption on mobile devices. Flash storage (the type most commonly used today) draws little power, and its performance is thought to exceed that of the network subsystem. In this paper we present initial evidence to the contrary even for common applications such as web browsing or application install. We find that just by varying the underlying flash storage, performance of web browsing over WiFi can vary roughly by 500%, and of application install by 300%. With a faster network (setup over USB), storage is taxed even more and the performance variation rose to roughly 700% for web browsing! The performance variation can be attributed to the characteristics of the storage device, the workload pattern (random or sequential), and the operating system itself. We also find that lower storage performance leads to increased CPU consumption, thus having an indirect impact on energy.
european conference on computer systems | 2015
Dorian Perkins; Nitin Agrawal; Akshat Aranya; Curtis Yu; Younghwan Go; Harsha V. Madhyastha; Cristian Ungureanu
Developers of cloud-connected mobile apps need to ensure the consistency of application and user data across multiple devices. Mobile apps demand different choices of distributed data consistency under a variety of usage scenarios. The apps also need to gracefully handle intermittent connectivity and disconnections, limited bandwidth, and client and server failures. The data model of the apps can also be complex, spanning inter-dependent structured and unstructured data, and needs to be atomically stored and updated locally, on the cloud, and on other mobile devices. In this paper we study several popular apps and find that many exhibit undesirable behavior under concurrent use due to inadequate treatment of data consistency. Motivated by the shortcomings, we propose a novel data abstraction, called a sTable, that unifies a tabular and object data model, and allows apps to choose from a set of distributed consistency schemes; mobile apps written to this abstraction can effortlessly sync data with the cloud and other mobile devices while benefiting from end-to-end data consistency. We build Simba, a data-sync service, to demonstrate the utility and practicality of our proposed abstraction, and evaluate it both by writing new apps and porting existing inconsistent apps to make them consistent. Experimental results show that Simba performs well with respect to sync latency, bandwidth consumption, server throughput, and scales for both the number of users and the amount of data.
international conference on data engineering | 2013
Cristian Ungureanu; Biplob Debnath; Stephen Rago; Akshat Aranya
The performance and capacity characteristics of flash storage make it attractive to use as a cache. Recency-based cache replacement policies rely on an in-memory full index, typically a B-tree or a hash table, that maps each object to its recency information. Even though the recency information itself may take very little space, the full index for a cache holding N keys requires at least log N bits per key. This metadata overhead is undesirably high when used for very large flash-based caches, such as key-value stores with billions of objects. To solve this problem, we propose a new RAM-frugal cache replacement policy that approximates the least-recently-used (LRU) policy. It uses two in-memory Bloom sub-filters (TBF) for maintaining the recency information and leverages an on-flash key-value store to cache objects. TBF requires only one byte of RAM per cached object, making it suitable for implementing very large flash-based caches. We evaluate TBF through simulation on traces from several block stores and key-value stores, as well as evaluate it using the Yahoo! Cloud Serving Benchmark in a real system implementation. Evaluation results show that TBF achieves cache hit rate and operations per second comparable to those of LRU in spite of its much smaller memory requirements.
symposium on operating systems principles | 2015
Biplob Debnath; Alireza Haghdoost; Asim Kadav; Mohammed G. Khatib; Cristian Ungureanu
Phase Change Memory (PCM) is emerging as an attractive alternative to Dynamic Random Access Memory (DRAM) in building data-intensive computing systems. PCM offers read/write performance asymmetry that makes it necessary to revisit the design of in-memory applications. In this paper, we focus on in-memory hash tables, a family of data structures with wide applicability. We evaluate several popular hash-table designs to understand their performance under PCM. We find that for write-heavy workloads the designs that achieve best performance for PCMdiffer from the ones that are best for DRAM, and that designs achieving a high load factor also cause a high number of memory writes. Finally, we propose PFHT, a PCM-Friendly Hash Table which presents a cuckoo hashing variant that is tailored to PCM characteristics, and offers a better trade-off between performance, the amount of writes generated, and the expected load factor than any of the existing DRAM-based implementations.
systems man and cybernetics | 2007
Haifeng Chen; Guofei Jiang; Cristian Ungureanu; Kenji Yoshihira
This paper proposes a novel failure-detection approach that can handle high-dimensional observation and frequent system changes. We extract two statistics from the subspace decomposition of observations, and use the mixture of Gaussians to model their probability density. Instead of monitoring the original data, the density model of extracted statistics is adaptively updated and examined regularly to detect failures. We also present a localization method to identify the faulty components once the failure happens. Applying our technique to monitor the component interactions in an e-commerce application shows satisfactory results in detecting a variety of injected failures.
systems man and cybernetics | 2007
Guofei Jiang; Haifeng Chen; Cristian Ungureanu; Kenji Yoshihira
Detection and diagnosis of faults in a large-scale distributed system is a formidable task. Interest in monitoring and using traces of user requests for fault detection has been on the rise recently. In this paper we propose novel fault detection methods based on abnormal trace detection. One essential problem is how to represent the large amount of training trace data compactly as an oracle. Our key contribution is the novel use of varied-length n-grams and automata to characterize normal traces. A new trace is compared against the learned automata to determine whether it is abnormal. We develop algorithms to automatically extract n-grams and construct multiresolution automata from training data. Further, both deterministic and multihypothesis algorithms are proposed for detection. We inspect the trace constraints of real application software and verify the existence of long n-grams. Our approach is tested in a real system with injected faults and achieves good results in experiments