Xiaoxin Tang
Shanghai Jiao Tong University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xiaoxin Tang.
high performance distributed computing | 2014
Xiaoxin Tang; Steven Mills; David M. Eyers; Kai-Cheung Leung; Zhiyi Huang; Minyi Guo
K Nearest Neighbors (k-NN) search is a widely used category of algorithms with applications in domains such as computer vision and machine learning. With the rapidly increasing amount of data available, and their high dimensionality, k-NN algorithms scale poorly on multicore systems because they hit a memory wall. In this paper, we propose a novel data filtering strategy, named Subspace Clustering for Filtering (SCF), for k-NN search algorithms on multicore platforms. By excluding unlikely features in k-NN search, this strategy can reduce memory footprint as well as computation. Experimental results on four k-NN algorithms show that SCF can improve their performance on two modern multicore platforms with insignificant loss of search precision.
network based information systems | 2012
Yao Shen; Minjie Wang; Xiaoxin Tang; Yi Luo; Minyi Guo
Context-awareness has become a key issue in Human-Computer InteractionHCI to provide better user experience under multi-device and multi-modal environment. With this intuition, we have proposed a web service based framework, which associates interactions with services, and provided service selection mechanism using context knowledge to achieve smart interaction migration [1]. A fundamental problem of such a service-oriented framework for interaction migration is to design an effective while scalable algorithm for service selection. In this paper, we propose a service selection algorithm considering not only context information and user preferences but also inter-service relations such as relative location. Our algorithm detects interaction hot spots within user active scope and presents the best service combination based on evaluation of interaction effectiveness. We also conduct simulation and the results illustrate that our algorithm is effective and scalable for interaction service selection.
IEEE Transactions on Parallel and Distributed Systems | 2015
Xiaoxin Tang; Zhiyi Huang; David M. Eyers; Steven Mills; Minyi Guo
k Nearest Neighbors (k-NN) search is a widely used category of algorithms with applications in domains such as computer vision and machine learning. Despite the desire to process increasing amounts of high-dimensional data within these domains, k-NN algorithms scale poorly on multicore systems because they hit a memory wall. In this paper, we propose a novel data filtering strategy for k-NN search algorithms on multicore platforms. By excluding unlikely features during the k-NN search process, this strategy can reduce the amount of computation required as well as the memory footprint. It is complementary to the data selection strategies used in other state-of-the-art k-NN algorithms. A Subspace Clustering for Filtering (SCF) method is proposed to implement the data filtering strategy. Experimental results on four k-NN algorithms show that SCF can significantly improve their performance on three modern multicore platforms with only a small loss of search precision.
international parallel and distributed processing symposium | 2015
Xiaoxin Tang; Zhiyi Huang; David M. Eyers; Steven Mills; Minyi Guo
k Nearest Neighbours (k-NN) search is a fundamental problem in many computer vision and machine learning tasks. These tasks frequently involve a large number of high-dimensional vectors, which require intensive computations. Recent research work has shown that the Graphics Processing Unit (GPU) is a promising platform for solving k-NN search. However, these search algorithms often meet a serious bottleneck on GPUs due to a selection procedure, called k-selection, which is the final stage of k-NN and significantly affects the overall performance. In this paper, we propose new data structures and optimization techniques to accelerate k-selection on GPUs. Three key techniques are proposed: Merge Queue, Buffered Search and Hierarchical Partition. Compared with previous works, the proposed techniques can significantly improve the computing efficiency of k-selection on GPUs. Experimental results show that our techniques can achieve an up to 4:2× performance improvement over the state-of-the-art methods.
trust, security and privacy in computing and communications | 2016
You Dai; Jin Yan; Xiaoxin Tang; Han Zhao; Minyi Guo
In this paper, we focus on designing an online credit card fraud detection framework with big data technologies, by which we want to achieve three major goals: 1) the ability to fuse multiple detection models to improve accuracy, 2) the ability to process large amount of data and 3) the ability to do the detection in real time. To accomplish that, we propose a general workflow, which satisfies most design ideas of current credit card fraud detection systems. We further implement the workflow with a new framework which consists of four layers: distributed storage layer, batch training layer, key-value sharing layer and streaming detection layer. With the four layers, we are able to support massive trading data storage, fast detection model training, quick model data sharing and real-time online fraud detection, respectively. We implement it with latest big data technologies like Hadoop, Spark, Storm, HBase, etc. A prototype is implemented and tested with a synthetic dataset, which shows great potentials of achieving the above goals.
international conference on parallel processing | 2013
Xiaoxin Tang; Steven Mills; David M. Eyers; Zhiyi Huang; Kai-Cheung Leung; Minyi Guo
Parallel programming is the mainstream for todays HPC applications. Programmers need to parallelize their programs to achieve better performance on multicore systems. However, due to a lack of good understanding of parallelism in algorithms, scheduling policy in runtime systems, and multicore architectures, programmers usually find it very hard to write high-performance, scalable programs on these parallel platforms. Although using a parallelized library written by experts can reduce the amount of work for coding, it does not automatically guarantee good performance according to our study. A better understanding of parallelism in algorithms, the OS/runtime systems, and hardware architectures is necessary if programmers wish to further improve performance. In this paper, we use SIFT-based feature matching within large-scale image collections to show the importance of three factors-the level of parallelism, scheduling policy, and memory architecture-that affect the performance of large-scale feature matching on multicore systems. We demonstrate experimental results using programs based on OpenCV and OpenMP, which are executed on both 16-core and 64-core machines. From our experimental results, we find that images with a large number of features achieve poor scalability on the 64-core machine due to a poor cache utilization. To address this issue of cache performance, we propose a Divide-and-Merge algorithm that divides the feature space into several small sub-spaces so that they fit within the cache. Our experiments show that the performance tuning addressing all of the three factors improves the speedup of feature matching from 10.6× to 21.5× on the 64-core machine. While the speedup is improved by 103%, the scalability of the feature matching algorithm is improved by up to 6.45 times on the 64-core machine with our performance tuning. Our study indicates that performance tuning on multicore systems is very challenging even for a simple image processing algorithm.
image and vision computing new zealand | 2013
Steven Mills; David M. Eyers; Kai-Cheung Leung; Xiaoxin Tang; Zhiyi Huang
Feature matching is a fundamental problem in many computer vision tasks. As datasets become larger, and individual image resolution increases, this is becoming more and more computationally demanding work. While prior knowledge about the scene geometry can, in some cases, reduce the number of image pairs that need to be considered, the sheer volume of data means that parallel and distributed computing solutions must be considered. In this paper we examine the costs incurred in such solutions, and assess the way in which the problem scales with the number of cores within a single node, and the number of nodes in a distributed system. We also consider the role of heterogeneous systems, where nodes with different numbers and types of cores (including GPUs) are included in a distributed system. We show that distribution of this task across a cluster of machines has good (sometimes super-linear) scalability. However, scalability on many-core machines and GPU architectures is more limited, and is thus an important area for future research.
network-based information systems | 2011
Minjie Wang; Xiaoxin Tang; Yao Shen; Feilong Tang; Minyi Guo
Interaction migration system in multi-device and multimodal environment has recently been in the spotlight. Web-service based framework is raised to meet the requirements under such environment. In this paper, we propose a context-driven method of Human-Computer Interaction(HCI) service selection for web-service based interaction migration framework. We adopt service-oriented concept, which associates interactions with several services, and provide service selection mechanism to achieve interaction migration. Context-awareness is supported by collecting user and environment contexts and making proper reaction towards context change. Moreover, user history is utilized to form user preference which aids service selection process. We illustrate how our service selection method finds appropriate services in a typical smart home scenario.
programming models and applications for multicores and manycores | 2012
Xiaoxin Tang; Long Zheng; Jun Ma; Yao Shen; Li Li; Minyi Guo
Image recognition is a very useful technique that can be applied in many areas. Two-Dimensional Continuous Dynamic Programming (2DCDP) is a pixel-level matching algorithm for object recognition. Compared with other methods, 2DCDP can offer a sufficiently high accuracy of recognition without training. In our previous work we use 2DCDP to implement image classification. However, we find the processing speed of 2DCDP is very slow. In this paper, we first analyze the performance issue of 2DCDP algorithm, and point out that large memory consumption is the performance bottleneck. Then, we improve 2DCDP algorithm and propose a new object recognition algorithm named Pixel-based Multi-Anchor (PMA) algorithm, which can locate anchor points that can be further used to locate the recognized area. Theoretical analysis expresses that our new algorithm can effectively reduce memory capacity requirement from O(n4) to O(n3), where n is the size of image. Furthermore, based on the understanding of multi-core architecture, we propose a fine-grained parallelism thread model to parallelize our PMA algorithm on mutli-core systems. Especially we take cache coherence problem into account, such that we further propose a coarse-grained parallelism thread model to optimize the PMA performance. Experimental results show that compared with the original 2DCDP algorithm, our new PMA algorithm can decrease the memory capacity requirement dramatically which improves the recognition speed. More important, PMA algorithm can processes efficiently big images that exceed the ability of original 2DCDP algorithm.
high performance computing and communications | 2013
Xiaoxin Tang; Steven Mills; David M. Eyers; Kai-Cheung Leung; Zhiyi Huang; Minyi Guo