Luo Mai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Luo Mai is active.

Explore More

Publication

Featured researches published by Luo Mai.

conference on emerging network experiment and technology | 2014

NetAgg: Using Middleboxes for Application-specific On-path Aggregation in Data Centres

Luo Mai; Lukas Rupprecht; Abdul Alim; Paolo Costa; Matteo Migliavacca; Peter R. Pietzuch; Alexander L. Wolf

Data centre applications for batch processing (e.g. map/reduce frameworks) and online services (e.g. search engines) scale by distributing data and computation across many servers. They typically follow a partition/aggregation pattern: tasks are first partitioned across servers that process data locally, and then those partial results are aggregated. This data aggregation step, however, shifts the performance bottleneck to the network, which typically struggles to support many-to-few, high-bandwidth traffic between servers. Instead of performing data aggregation at edge servers, we show that it can be done more efficiently along network paths. We describe NETAGG, a software platform that supports on-path aggregation for network-bound partition/aggregation applications. NETAGG exploits a middlebox-like design, in which dedicated servers (agg boxes) are connected by high-bandwidth links to network switches. Agg boxes execute aggregation functions provided by applications, which alleviates network hotspots because only a fraction of the incoming traffic is forwarded at each hop. NETAGG requires only minimal application changes: it uses shim layers on edge servers to redirect application traffic transparently to the agg boxes. Our experimental results show that NETAGG improves substantially the throughput of two sample applications, the Solr distributed search engine and the Hadoop batch processing framework. Its design allows for incremental deployment in existing data centres and incurs only a modest investment cost.

mobile adhoc and sensor systems | 2011

Load Balanced Rendezvous Data Collection in Wireless Sensor Networks

Luo Mai; Longfei Shangguan; Chao Lang; Junzhao Du; Hui Liu; Zhenjiang Li; Mo Li

We study the rendezvous data collection problem for the mobile sink in wireless sensor networks. We introduce to jointly optimize trajectory planning for the mobile sink and workload balancing for the network. By doing so, the mobile sink is able to efficiently collect network-wide data within a given delay bound and the network can eliminate the energy bottleneck to dramatically prolong its lifetime. Such a joint optimization problem is shown to be NP-hard and we propose an approximation algorithm, named RPS-LB, to approach the optimal solution. In RPS-LB, according to observed properties of the median reference structure in the network, a series of Rendezvous Points (RPs) are selected to construct the trajectory for the mobile sink and the derived approximation ratio of RPSLB guarantees that the formed trajectory is comparable with the optimal solution. The workload allocated to each RP is proven to be balanced mathematically. We then relax the assumption that mobile sink knows the location of each sensor node and present a localized, fully distributed version, RPS-LB-D, which largely improves the system applicability in practice. We verify the effectiveness of our proposals via extensive experiments.

usenix annual technical conference | 2016

FLICK: developing and running application-specific network services

Abdul Alim; Richard G. Clegg; Luo Mai; Lukas Rupprecht; Eric Seckler; Paolo Costa; Peter R. Pietzuch; Alexander L. Wolf; Nikolai Sultana; Jonathon Andrew Crowcroft; Anil Madhavapeddy; Andrew W. Moore; Richard Mortier; Masoud Koleini; Luis Oviedo; Derek McAuley; Matteo Migliavacca

Data centre networks are increasingly programmable, with application-specific network services proliferating, from custom load-balancers to middleboxes providing caching and aggregation. Developers must currently implement these services using traditional low-level APIs, which neither support natural operations on application data nor provide efficient performance isolation. We describe FLICK, a framework for the programming and execution of application-specific network services on multi-core CPUs. Developers write network services in the FLICK language, which offers high-level processing constructs and application-relevant data types. FLICK programs are translated automatically to efficient, parallel task graphs, implemented in C++ on top of a user-space TCP stack. Task graphs have bounded resource usage at runtime, which means that the graphs of multiple services can execute concurrently without interference using cooperative scheduling. We evaluate FLICK with several services (an HTTP load-balancer, a Memcached router and a Hadoop data aggregator), showing that it achieves good performance while reducing development effort.

acm multimedia | 2017

TensorLayer: A Versatile Library for Efficient Deep Learning Development

Hao Dong; Akara Supratak; Luo Mai; Fangde Liu; Axel Oehmichen; Simiao Yu; Yike Guo

Recently we have observed emerging uses of deep learning techniques in multimedia systems. Developing a practical deep learning system is arduous and complex. It involves labor-intensive tasks for constructing sophisticated neural networks, coordinating multiple network models, and managing a large amount of training-related data. To facilitate such a development process, we propose TensorLayer which is a Python-based versatile deep learning library. TensorLayer provides high-level modules that abstract sophisticated operations towards neuron layers, network models, training data and dependent training jobs. In spite of offering simplicity, it has transparent module interfaces that allows developers to flexibly embed low-level controls within a backend engine, with the aim of supporting fine-grain tuning towards training. Real-world cluster experiment results show that TensorLayeris able to achieve competitive performance and scalability in critical deep learning tasks. TensorLayer was released in September 2016 on GitHub. Since after, it soon become one of the most popular open-sourced deep learning library used by researchers and practitioners.

Archive | 2017

Extending programs with debug-related features, with application to hardware development

Nikolai Sultana; Salvator Galea; David J. Greaves; Marcin Wójcik; Noa Zilberman; Richard G. Clegg; Luo Mai; Richard Mortier; Peter R. Pietzuch; Jonathon Andrew Crowcroft; Andrew W. Moore

This work has received funding from the EPSRC NaaS grant EP/K034723/1, European Unions Horizon 2020 research and innovation programme 2014-2018 under the SSICLOPS (grant agreement No. 644866), the Leverhulme Trust Early Career Fellowship ECF-2016-289 and the Newton Trust.

very large data bases | 2018

Chi: a scalable and programmable control plane for distributed stream processing systems

Luo Mai; Kai Zeng; Rahul Potharaju; Le Xu; Steve Suh; Shivaram Venkataraman; Paolo Costa; Terry Kim; Saravanam Muthukrishnan; Vamsi Kuppa; Sudheer Dhulipalla; Sriram Rao

Stream-processing workloads and modern shared cluster environments exhibit high variability and unpredictability. Combined with the large parameter space and the diverse set of user SLOs, this makes modern streaming systems very challenging to statically configure and tune. To address these issues, in this paper we investigate a novel control-plane design, Chi, which supports continuous monitoring and feedback, and enables dynamic re-configuration. Chi leverages the key insight of embedding control-plane messages in the data-plane channels to achieve a low-latency and flexible control plane for stream-processing systems. Chi introduces a new reactive programming model and design mechanisms to asynchronously execute control policies, thus avoiding global synchronization. We show how this allows us to easily implement a wide spectrum of control policies targeting different use cases observed in production. Large-scale experiments using production workloads from a popular cloud provider demonstrate the flexibility and efficiency of our approach.

Archive | 2013