Featured Researches

Performance

10-millisecond Computing

Despite computation becomes much complex on data with an unprecedented scale, we argue computers or smart devices should and will consistently provide information and knowledge to human being in the order of a few tens milliseconds. We coin a new term 10-millisecond computing to call attention to this class of workloads. 10-millisecond computing raises many challenges for both software and hardware stacks. In this paper, using a typical workload-memcached on a 40-core server (a main-stream server in near future), we quantitatively measure 10-ms computing's challenges to conventional operating systems. For better communication, we propose a simple metric-outlier proportion to measure quality of service: for N completed requests or jobs, if M jobs or requests' latencies exceed the outlier threshold t, the outlier proportion is M/N . For a 1K-scale system running Linux (version 2.6.32), LXC (version 0.7.5) or XEN (version 4.0.0), respectively, we surprisingly find that so as to reduce the service outlier proportion to 10% (10% users will feel QoS degradation), the outlier proportion of a single server has to be reduced by 871X, 2372X, 2372X accordingly. Also, we discuss the possible design spaces of 10-ms computing systems from perspectives of datacenter architectures, networking, OS and scheduling, and benchmarking.

Read more
Performance

A Basic Result on the Superposition of Arrival Processes in Deterministic Networks

Time-Sensitive Networking (TSN) and Deterministic Networking (DetNet) are emerging standards to enable deterministic, delay-critical communication in such networks. This naturally (re-)calls attention to the network calculus theory (NC), since a rich set of results for delay guarantee analysis have already been developed there. One could anticipate an immediate adoption of those existing network calculus results to TSN and DetNet. However, the fundamental difference between the traffic specification adopted in TSN and DetNet and those traffic models in NC makes this difficult, let alone that there is a long-standing open challenge in NC. To address them, this paper considers an arrival time function based max-plus NC traffic model. In particular, for the former, the mapping between the TSN / DetNet and the NC traffic model is proved. For the latter, the superposition property of the arrival time function based NC traffic model is found and proved. Appealingly, the proved superposition property shows a clear analogy with that of a well-known counterpart traffic model in NC. These results help make an important step towards the development of a system theory for delay guarantee analysis of TSN / DetNet networks.

Read more
Performance

A Bounded Multi-Vacation Queue Model for Multi-stage Sleep Control 5G Base station

Modelling and control of energy consumption is an important problem in telecommunication this http URL model such systems, this paper publishes a bounded multi-vacation queue model. The energy consumption predicted by the model shows an average error rate of 0.0177 and the delay predicted by the model shows an average error rate of 0.0655 over 99 test instances.Subsequently, an optimization algorithm is proposed to minimize the energy consumption while not violate the delay bound. Furthermore, given current state of art 5G base station system configuration, numerical results shows that with the increase of traffic load, energy saving rate becomes less.

Read more
Performance

A Comparative Evaluation of Log-Based Process Performance Analysis Techniques

Process mining has gained traction over the past decade and an impressive body of research has resulted in the introduction of a variety of process mining approaches measuring process performance. Having this set of techniques available, organizations might find it difficult to identify which approach is best suited considering context, performance indicator, and data availability. In light of this challenge, this paper aims at introducing a framework for categorizing and selecting performance analysis approaches based on existing research. We start from a systematic literature review for identifying the existing works discussing how to measure process performance based on information retrieved from event logs. Then, the proposed framework is built starting from the information retrieved from these studies taking into consideration different aspects of performance analysis.

Read more
Performance

A Comparative Measurement Study of Deep Learning as a Service Framework

Big data powered Deep Learning (DL) and its applications have blossomed in recent years, fueled by three technological trends: a large amount of digitized data openly accessible, a growing number of DL software frameworks in open source and commercial markets, and a selection of affordable parallel computing hardware devices. However, no single DL framework, to date, dominates in terms of performance and accuracy even for baseline classification tasks on standard datasets, making the selection of a DL framework an overwhelming task. This paper takes a holistic approach to conduct empirical comparison and analysis of four representative DL frameworks with three unique contributions. First, given a selection of CPU-GPU configurations, we show that for a specific DL framework, different configurations of its hyper-parameters may have a significant impact on both performance and accuracy of DL applications. Second, to the best of our knowledge, this study is the first to identify the opportunities for improving the training time performance and the accuracy of DL frameworks by configuring parallel computing libraries and tuning individual and multiple hyper-parameters. Third, we also conduct a comparative measurement study on the resource consumption patterns of four DL frameworks and their performance and accuracy implications, including CPU and memory usage, and their correlations to varying settings of hyper-parameters under different configuration combinations of hardware, parallel computing libraries. We argue that this measurement study provides in-depth empirical comparison and analysis of four representative DL frameworks, and offers practical guidance for service providers to deploying and delivering DL as a Service (DLaaS) and for application developers and DLaaS consumers to select the right DL frameworks for the right DL workloads.

Read more
Performance

A Comparison of Parallel Graph Processing Implementations

The rapidly growing number of large network analysis problems has led to the emergence of many parallel and distributed graph processing systems---one survey in 2014 identified over 80. Since then, the landscape has evolved; some packages have become inactive while more are being developed. Determining the best approach for a given problem is infeasible for most developers. To enable easy, rigorous, and repeatable comparison of the capabilities of such systems, we present an approach and associated software for analyzing the performance and scalability of parallel, open-source graph libraries. We demonstrate our approach on five graph processing packages: GraphMat, the Graph500, the Graph Algorithm Platform Benchmark Suite, GraphBIG, and PowerGraph using synthetic and real-world datasets. We examine previously overlooked aspects of parallel graph processing performance such as phases of execution and energy usage for three algorithms: breadth first search, single source shortest paths, and PageRank and compare our results to Graphalytics.

Read more
Performance

A Constrained Shortest Path Scheme for Virtual Network Service Management

Virtual network services that span multiple data centers are important to support emerging data-intensive applications in fields such as bioinformatics and retail analytics. Successful virtual network service composition and maintenance requires flexible and scalable 'constrained shortest path management' both in the management plane for virtual network embedding (VNE) or network function virtualization service chaining (NFV-SC), as well as in the data plane for traffic engineering (TE). In this paper, we show analytically and empirically that leveraging constrained shortest paths within recent VNE, NFV-SC and TE algorithms can lead to network utilization gains (of up to 50%) and higher energy efficiency. The management of complex VNE, NFV-SC and TE algorithms can be, however, intractable for large scale substrate networks due to the NP-hardness of the constrained shortest path problem. To address such scalability challenges, we propose a novel, exact constrained shortest path algorithm viz., 'Neighborhoods Method' (NM). Our NM uses novel search space reduction techniques and has a theoretical quadratic speed-up making it practically faster (by an order of magnitude) than recent branch-and-bound exhaustive search solutions. Finally, we detail our NM-based SDN controller implementation in a real-world testbed to further validate practical NM benefits for virtual network services.

Read more
Performance

A Data-Assisted Reliability Model for Carrier-Assisted Cold Data Storage Systems

Cold data storage systems are used to allow long term digital preservation for institutions' archives. The common functionality among cold and warm/hot data storage is that the data is stored on some physical medium for read-back at a later time. However in cold storage, write and read operations are not necessarily done in the same exact geographical location. Hence, a third party assistance is typically utilized to bring together the medium and the drive. On the other hand, the reliability modeling of such a decomposed system poses few challenges that do not necessarily exist in other warm/hot storage alternatives such as fault detection and absence of the carrier, all totaling up to the data unavailability issues. In this paper, we propose a generalized non-homogenous Markov model that encompasses the aging of the carriers in order to address the requirements of today's cold data storage systems in which the data is encoded and spread across multiple nodes for the long-term data retention. We have derived useful lower/upper bounds on the overall system availability. Furthermore, the collected field data is used to estimate parameters of a Weibull distribution to accurately predict the lifetime of the carriers in an example scale-out setting. In this study, we numerically demonstrate the significance of carriers' presence and the key role that their timely maintenance plays on the long-term reliability and availability of the stored content.

Read more
Performance

A Fast Analytical Model of Fully Associative Caches

While the cost of computation is an easy to understand local property, the cost of data movement on cached architectures depends on global state, does not compose, and is hard to predict. As a result, programmers often fail to consider the cost of data movement. Existing cache models and simulators provide the missing information but are computationally expensive. We present a lightweight cache model for fully associative caches with least recently used (LRU) replacement policy that gives fast and accurate results. We count the cache misses without explicit enumeration of all memory accesses by using symbolic counting techniques twice: 1) to derive the stack distance for each memory access and 2) to count the memory accesses with stack distance larger than the cache size. While this technique seems infeasible in theory, due to non-linearities after the first round of counting, we show that the counting problems are sufficiently linear in practice. Our cache model often computes the results within seconds and contrary to simulation the execution time is mostly problem size independent. Our evaluation measures modeling errors below 0.6% on real hardware. By providing accurate data placement information we enable memory hierarchy aware software development.

Read more
Performance

A Framework for Allocating Server Time to Spot and On-demand Services in Cloud Computing

Cloud computing delivers value to users by facilitating their access to computing capacity in periods when their need arises. An approach is to provide both on-demand and spot services on shared servers. The former allows users to access servers on demand at a fixed price and users occupy different periods of servers. The latter allows users to bid for the remaining unoccupied periods via dynamic pricing; however, without appropriate design, such periods may be arbitrarily small since on-demand users arrive randomly. This is also the current service model adopted by Amazon Elastic Cloud Compute. In this paper, we provide the first integral framework for sharing the time of servers between on-demand and spot services while optimally pricing spot instances. It guarantees that on-demand users can get served quickly while spot users can stably utilize servers for a properly long period once accepted, which is a key feature to make both on-demand and spot services accessible. Simulation results show that, by complementing the on-demand market with a spot market, a cloud provider can improve revenue by up to 464.7%. The framework is designed under assumptions which are met in real environments. It is a new tool that cloud operators can use to quantify the advantage of a hybrid spot and on-demand service, eventually making the case for operating such service model in their own infrastructures.

Read more

Ready to get started?

Join us today