Gaojin Wen
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gaojin Wen.
cluster computing and the grid | 2012
Shucai Xiao; Pavan Balaji; James Dinan; Qian Zhu; Rajeev Thakur; Susan Coghlan; Heshan Lin; Gaojin Wen; Jue Hong; Wu-chun Feng
This paper presents a framework to support transparent, live migration of virtual GPU accelerators in a virtualized execution environment. Migration is a critical capability in such environments because it provides support for fault tolerance, on-demand system maintenance, resource management, and load balancing in the mapping of virtual to physical GPUs. Techniques to increase responsiveness and reduce migration overhead are explored. The system is evaluated by using four application kernels and is demonstrated to provide low migration overheads. Through transparent load balancing, our system provides a speedup of 1.7 to 1.9 for three of the four application kernels.
conference on decision and control | 2011
Gaojin Wen; Jue Hong; Cheng Zhong Xu; Pavan Balaji; Shengzhong Feng; Pingchuang Jiang
With the rapid advance of cloud computing, large scale data center plays a key role in cloud computing. Energy consumption of such distributed systems has become a prominent problem and received much attention. Among existing energy-saving methods, application scheduling can reduce energy consumption by replacing and consolidating applications to decrease the number of running servers. However, most application scheduling approaches did not consider the energy cost on network devices, which is also a big portion of power consumption in large data centers. In this paper we propose a Hierarchical Scheduling Algorithm for applications, namely HSA, to minimize the energy consumption of both servers and network devices. In HSA, a Dynamic Maximum Node Sorting (DMNS) method is developed to optimize the application placement on servers connected to a common switch. Hierarchical crossing-switch adjustment is applied to further reduce the number of running servers. As a result, both the number of running servers and the amount of data transfer can be greatly reduced. The time complexity of HSA is Θ(n ∗ log(logn)), where n is the total number of the severs in the data center. Its stability is verified via simulations. Experiments show that the performance of HSA outperforms existing algorithms.
The Visual Computer | 2006
Gaojin Wen; Zhaoqi Wang; Shihong Xia; Dengming Zhu
Based on the classic absolute orientation technique, a new method for least-squares fitting of multiple point sets in m-dimensional space is proposed, analyzed and extended to a weighted form in this paper. This method generates a fixed point set from k corresponding original m-dimensional point sets and minimizes the mean squared error between the fixed point set and these k point sets under the similarity transformation. Experiments and interesting applications are presented to show its efficiency and accuracy.
virtual reality software and technology | 2006
Gaojin Wen; Zhaoqi Wang; Shihong Xia; Dengming Zhu
In this paper, we propose a practical and systematical solution to the mapping problem that is from 3D marker position data recorded by optical motion capture systems to joint trajectories together with a matching skeleton based on least-squares fitting techniques. First, we preprocess the raw data and estimate the joint centers based on related efficient techniques. Second, a skeleton of fixed length which precisely matching the joint centers are generated by an articulated skeleton fitting method. Finally, we calculate and rectify joint angles with a minimum angle modification technique. We present the results for our approach as applied to several motion-capture behaviors, which demonstrates the positional accuracy and usefulness of our method.
computer graphics international | 2005
Gaojin Wen; Dengming Zhu; Shihong Xia; Zhaoqi Wang
The absolute orientation technique, minimizing the mean squared error between two matched point sets under similarity transformations, has numerously applied in the areas of photogrammetry, robotics, object motion analysis as well as object pose estimation following recognition. Based on it, in this paper, a total least squares fitting algorithm, which generates a fixed point set from k corresponding original point sets and minimizes the mean squared error between the fixed point sets and these k point sets, is proposed and proved. Experiments and interesting applications are also presented to show its efficiency, accuracy and robustness.
Computing in Science and Engineering | 2010
Siyuan Liu; Gaojin Wen; Jianping Fan
Because it involves a huge amount of hydraulic engineering works, feasibility analysis of large-scale water-diversion projects is complex. Based on 3D modeling and visualization techniques, this water-diversion project simulation and evaluation system builds 3D digital geological models to aid in project development and assessment. Initial results from a project on rivers in China show the systems considerable promise.
international supercomputing conference | 2013
Jue Hong; Pavan Balaji; Gaojin Wen; Bibo Tu; Junming Yan; Cheng Zhong Xu; Shengzhong Feng
Achieving fair resource sharing is rapidly becoming an essential requirement in cluster computing systems. Although many fair scheduling algorithms have been proposed in recent decades, controlling resource sharing among jobs on servers remains a challenging problem that, if not handled well, may result in chaotic resource contention and service-level agreement violation of jobs. To address this problem, we propose a resource container–based job management approach for fair resource sharing. In our approach, we first design and implement a general container-based job management module, providing lightweight and fine-grained resource allocation and isolation for job execution. With this module, we propose a resource-aware management scheme to enable fair resource sharing in job scheduling and dispatching. We conduct experiments by implementing the proposed module and applying the scheme on TCluster, a self-developed cluster computing system of a worldwide top Internet corporation. Results show that our approach performs well in guaranteeing fair resource sharing with negligible overhead.
international congress on image and signal processing | 2010
Gaojin Wen; Shengzhong Feng; Wen Xiong; Zhendong Bei; Juanjuan Zhao; Nian Yang
In this paper we address the problem of automated visual hull computation. This problem is important for various tasks from three dimensional object modeling to markerless motion capture. Different from classical visual hull method that are based on volume or surface, we compute the visual hull through operations on line segments. General intersection and surface reconstruction operations on line segments are proposed together with efficient algorithms. Experimental results demonstrate that the proposed approach can generate a robust and reliable visual hull.
Archive | 2013
Guijuan Zhang; Gaojin Wen; Shengzhong Feng
GPU-based fluid animation is a hot topic in many applications such as films, cartoons and games. As the flow phenomena contain highly complex behaviors and rich visual details, it is necessary to explore the intrinsic multi-scale property in fluid animation. In this paper, we present a multi-scale fluid animation method on GPU. Our method is designed to animate fluid details of grid and sub-grid scale with high efficiency. In our method, the motion of liquid surface is obtained by solving Navier-Stokes equations and Level Set equation while the dynamics of fluid sprays are dominated by SPH solution. The interaction between liquid surface and sprays is modeled by a two-way coupling algorithm which can be executed efficiently on GPU. From the results of the experiments, we can reach the conclusion that the proposed GPU based acceleration method can improve the processing speed of the multi-scale fluid animation significantly while getting interesting details.
Archive | 2013
Yechen Gui; Shenzhong Feng; Gaojin Wen; Guijuan Zhang; Yanyi Wan; Tao Liu
Binomial tree model is often used for option pricing in the financial market. According to this method, it is rather expensive to obtain high accurate option price. Although existing methods running on CPU clusters have improved the efficiency significantly, there is still a great gap between the real performance and the desired. In this paper, we parallelize this model on CUDA to further improve the efficiency. We optimize our method according to principles of memory hierarchy and extend it to support multiple GPUs. Experiments on single Tesla C1060 GPU chip show an average of 285(times ) speedup compared to the result on single CPU node. Furthermore, for the data size of 64 K, GPU performance has reached 315 Gflops, which outperforms the earlier version on the Sun station by a factor of about 100(times ). The maximum performance reached with 108 GPU nodes is 30 Tflops.