Is this you? Create Your Porfile

Hai Jin

Huazhong University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hai Jin is active.

Explore More

Publication

Featured researches published by Hai Jin.

network and parallel computing | 2013

Energy Efficient Task Scheduling in Mobile Cloud Computing

Dezhong Yao; Chen Yu; Hai Jin; Jiehan Zhou

Cloud computing can enhance the computing capability of mobile systems by offloading. However, the communication between the mobile device and the cloud is not free. Transmitting large data to cloud consumes much more energy than processing data in mobile device, especially in a low bandwidth condition. Further, some processing tasks can avoid transmitting large data between mobile device and server. Those processing tasks encoding, rendering are as the compress algorithm, which can reduce the size of raw data before it is sent to server. In this paper, we present an energy efficient task scheduling strategy EETS to determine what kind of task with certain amount of data should be chosen to be offloaded under different environment. We have evaluated the scheduler by using an Android smartphone. The results show that our strategy can achieve 99% of accuracy to choose the right action in order to minimize the system energy usage.

Frontiers of Computer Science in China | 2014

An adaptive switching scheme for iterative computing in the cloud

Yu Zhang; Xiaofei Liao; Hai Jin; Li Lin; Feng Lu

Delta-based accumulative iterative computation (DAIC) model is currently proposed to support iterative algorithms in a synchronous or an asynchronous way. However, both the synchronous DAIC model and the asynchronous DAIC model only satisfy some given conditions, respectively, and perform poorly under other conditions either for high synchronization cost or for many redundant activations. As a result, the whole performance of both DAIC models suffers fromthe serious network jitter and load jitter caused bymultitenancy in the cloud. In this paper, we develop a system, namely HybIter, to guarantee the performance of iterative algorithms under different conditions. Through an adaptive execution model selection scheme, it can efficiently switch between synchronous and asynchronous DAIC model in order to be adapted to different conditions, always getting the best performance in the cloud. Experimental results show that our approach can improve the performance of current solutions up to 39.0%.

Concurrency and Computation: Practice and Experience | 2013

A measurement-based study on user management in private BitTorrent communities

Honglei Jiang; Hai Jin; Song Guo; Xiaofei Liao

Beyond the traditional BitTorrent, a new genre of peer‐to‐peer communications protocol for worldwide file sharing is rapidly evolving towards private BitTorrent (PT). In recent years, a proliferation of PT communities have emerged. To enhance the user experience, account‐based share‐ratio enforcement (SRE) has been developed and widely adopted. Whereas existing studies mainly take SRE as an incentive, we discover that it also plays a critical role in selecting and filtering users. In addition to SRE, a rich set of user management rules, such as registration management, banning policies, and user caste system, are also studied in this paper. This includes to explore their effects on user behavior, download performance, content availability, and system scalability. The measurement results presented in this paper are based on large‐scale experiments conducted over six representative PT sites for over a year. We find that the stricter registration will lead to fewer new users, resulting in a scalability problem, which is critical for the PT communities because the download performance and content availability depend on not only the contribution of users but also the population of the community. Our measurement and analysis pose a direction for the design of new incentive mechanisms that take the difficulty of enrollment into the consideration.Copyright

dependable autonomic and secure computing | 2015

A Map-Reduce Method for Training Autoencoders on Xeon Phi

Qiongjie Yao; Xiaofei Liao; Hai Jin

The stacked autoencoder is a deep learning model that consists of multiple autoencoders. This model has been widely applied in numerous machine learning applications. A significant amount of effort has been made to increase the size of the deep learning model with respect to the size of the training dataset and the parameter of the model to improve performance. However, training a large deep learning model is highly time consuming. Recent studies have applied the CPU cluster with thousands of machines as well as the single GPU or the GPU cluster, to train large scale deep learning models. As a high-performance coprocessor like GPU, the Xeon Phi can be an alternative tool for training large scale deep learning models on a single machine. The Xeon Phi can be recognized as a small cluster which features about 60 cores, and each core supports four hardware threads. Massive parallelism offsets the low computing capacity of every core, but challenges an efficient parallel autoencoders design. In this paper, we analyze the training algorithm of autoen-coders based on the matrix operation and point out the thread oversubscription problem, which results in performance degradation. Based on the observation, we propose our map-reduce implementation of autoencoders on the Xeon Phi coprocessor. Our basic idea is to parallelize multiple autoencoder model replicas with bulk synchronous parallel (BSP) communication model where the parameters are updated after the computations of all replicas are completed. Each thread is responsible for one model replica, and all replicas work together on the same mini-batch. This data parallelism method is suitable for training autoencoders on the Xeon Phi, and can extend to asynchronous parallel training method without thread oversubscription. In our experiment the speedup is four times higher than that of sequential implementation. Enlarging the size of the autoencoder model, our method still gets stable speedup.

Mobile Networks and Applications | 2017

Automatically Setting Parameter-Exchanging Interval for Deep Learning

Siyuan Wang; Xiaofei Liao; Xuepeng Fan; Hai Jin; Qiongjie Yao; Yu Zhang

Parameter-server frameworks play an important role in scaling-up distributed deep learning algorithms. However, the constant growth of neural network size has led to a serious bottleneck on exchanging parameters across machines. Recent efforts rely on manually setting a parameter-exchanging interval to reduce communication overhead, regardless of the parameter-server’s resource availability as well. It may face poor performance or inaccurate results for inappropriate interval. Meanwhile, request burst may occur, exacerbating the bottleneck.In this paper, we propose an approach to automatically set the optimal exchanging interval, aiming to remove the parameter-exchanging bottleneck and to evenly utilize resources without losing training accuracy. The key idea is to increase the interval on different training nodes on the basis of the knowledge of available resources and choose different intervals for each slave node to avoid request bursts. We adopted this method to optimize the parallel Stochastic Gradient Descent algorithm, through which we successfully sped up parameter-exchanging process by eight times.

network and parallel computing | 2013

FRESA: A Frequency-Sensitive Sampling-Based Approach for Data Race Detection

Neng Huang; Zhiyuan Shao; Hai Jin

Concurrent programs are difficult to debug due to the inherent concurrence and indeterminism. One of the problems is race conditions. Previous work on dynamic race detection includes fast but imprecise methods that report false alarms, and slow but precise ones that never report false alarms. Some researchers have combined these two methods. However, the overhead is still massive. This paper exploits the insight that full record on detector is unnecessary in most cases. Even prior sampling method has something to do to reduce overhead with precision guaranteed. That is, we can use a frequency-sensitive sampling approach. With our model on sampling dispatch, we can drop most unnecessary detection overhead. Experiment results on DaCapo benchmarks show that our heuristic sampling race detector is performance-faster and overhead-lower than traditional race detectors with no loss in precision, while never reporting false alarms.

Mobile Networks and Applications | 2013

Editorial for Special Issue on New Technologies and Applications for Wireless Communications & Mobile Cloud Computing

Taeshik Shon; Athanasios V. Vasilakos; Bharat K. Bhargava; Ivan Stojmenovic; Hai Jin; Albert Y. Zomaya

Editorial: This special issue features eight selected papers with high quality. The first article, “A Cost-Effective Methodology Applied to Videoconference Services on Hybrid Clouds” by Javier Cervino, Pedro Rodriguez, Irena Trajkovska, Fernando Escribano and Joaquin Salvachua, tackles the optimization of applications in multi-provider hybrid cloud scenarios from an economic point of view. However the approach was intended to introduce a novel solution by making maximum use of divide and rule. Also this article describes a methodology to create cost aware cloud applications that can be broken down into the three most important components in cloud infrastructures: computation, network and storage. A real videoconference system has been modified in order to evaluate this idea with both theoretical and empirical experiments. The proposed system has become a widely used as a tool in several national and European projects for e-learning and collaboration purposes. The second article, “An Efficient Dynamic Integration Middleware for Cyber-Physical Systems in Mobile Environments” by Young-Sik Jeong, Sang Oh Park and Jong Hyuk Park, proposes a cyber-physical system (CPS) middleware framework that ensures interoperability and communication between heterogeneous components in a global CPS network. A CPS is a tight integration of the system’s computational and physical elements. The CPS technology builds on the older discipline of embedded systems, and CPS applications can be found in diverse industry sectors, such as smart home, health care, and transportation. The article assume that a global CPS network that integrates different CPS networks appears in the near future. Through local and global communications, the proposed middleware makes mobile devices in different networks interoperable. The third article, “Particle Swarm Optimization with Skyline Operator for Fast Cloud-based Web Service Composition” by Shangguang Wang, Qibo Sun, Hua Zou, Fangchun Yang, proposed a fast Cloud-based web service T. Shon (*) Division of Information and Computer Engineering, College of Information Technology, Ajou University, Suwon, South Korea e-mail: [email protected]

International Journal of Parallel Programming | 2015