Yuzhen Huang
The Chinese University of Hong Kong
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yuzhen Huang.
international conference on big data | 2015
Huanhuan Wu; James Cheng; Yi Lu; Yiping Ke; Yuzhen Huang; Da Yan; Hejun Wu
Core decomposition has been applied widely in the visualization and analysis of massive networks. However, existing studies of core decomposition were only limited to non-temporal graphs, while many real-world graphs can be naturally modeled as temporal graphs (e.g., the interaction between users at different time in online social networks, the phone call or messaging records between friends over time, etc.). In this paper, we define the problem of core decomposition in a temporal graph, propose efficient distributed algorithms to compute the cores in massive temporal graphs, and discuss how the technique can be used in temporal graph analysis.
international conference on data engineering | 2016
Huanhuan Wu; Yuzhen Huang; James Cheng; Jinfeng Li; Yiping Ke
A temporal graph is a graph in which vertices communicate with each other at specific time, e.g., A calls B at 11 a.m. and talks for 7 minutes, which is modeled by an edge from A to B with starting time “11 a.m.” and duration “7 mins”. Temporal graphs can be used to model many networks with time-related activities, but efficient algorithms for analyzing temporal graphs are severely inadequate. We study fundamental problems such as answering reachability and time-based path queries in a temporal graph, and propose an efficient indexing technique specifically designed for processing these queries in a temporal graph. Our results show that our method is efficient and scalable in both index construction and query processing.
very large data bases | 2017
Fan Yang; Fanhua Shang; Yuzhen Huang; James Cheng; Jinfeng Li; Yunjian Zhao; Ruihao Zhao
Tensors are higher order generalizations of matrices to model multi-aspect data, e.g., a set of purchase records with the schema (user_id, product_id, timestamp, feedback). Tensor factorization is a powerful technique for generating a model from a tensor, just like matrix factorization generates a model from a matrix, but with higher accuracy and richer information as more attributes are available in a higher- order tensor than a matrix. The data model obtained by tensor factorization can be used for classification, recommendation, anomaly detection, and so on. Though having a broad range of applications, tensor factorization has not been popularly applied compared with matrix factorization that has been widely used in recommender systems, mainly due to the high computational cost and poor scalability of existing tensor factorization methods. Efficient and scalable tensor factorization is particularly challenging because real world tensor data are mostly sparse and massive. In this paper, we propose a novel distributed algorithm, called Lock-Free Tensor Factorization (LFTF), which significantly improves the efficiency and scalability of distributed tensor factorization by exploiting asynchronous execution in a re-formulated problem. Our experiments show that LFTF achieves much higher CPU and network throughput than existing methods, converges at least 17 times faster and scales to much larger datasets.
international conference on management of data | 2017
Fan Yang; Yuzhen Huang; Yunjian Zhao; Jinfeng Li; Guanxian Jiang; James Cheng
Coarse-grained operators such as map and reduce have been widely used for large-scale data processing. While they are easy to master, over-simplified APIs sometimes hinder programmers from fine-grained control on how computation is performed and hence designing more efficient algorithms. On the other hand, resorting to domain-specific languages (DSLs) is also not a practical solution, since programmers may need to learn how to use many systems that can be very different from each other, and the use of low-level tools may even result in bug-prone programming. In [7] our prior work, we proposed Husky which provides a highly expressive API to solve the above dilemma. It allows developers to program in a variety of patterns, such as MapReduce, GAS, vertex-centric programs, and even asynchronous machine learning. While the Husky C++ engine provides great performance, in this demo proposal we introduce PyHusky and ScHusky, which allow users (e.g., data scientists) without system knowledge and low-level programming skills to leverage the performance of Husky and build high-level applications with ease using Python and Scala.
international conference on big data | 2016
Jinfeng Li; James Cheng; Yunjian Zhao; Fan Yang; Yuzhen Huang; Haipeng Chen; Ruihao Zhao
General-purpose distributed systems for data processing become popular in recent years due to the high demand from industry for big data analytics. However, there is a lack of comprehensive comparison among these systems and detailed analysis on their performance. In this paper, we conduct an extensive performance study on four state-of-the-art general-purpose distributed computing systems. Our results reveal useful insights on the design and implementation, which help the improvement of existing systems and the development of better new systems.
IEEE Transactions on Knowledge and Data Engineering | 2016
Huanhuan Wu; James Cheng; Yiping Ke; Silu Huang; Yuzhen Huang; Hejun Wu
Shortest path is a fundamental graph problem with numerous applications. However, the concept of classic shortest path is insufficient. In this paper, we study various concepts of “shortest” path in temporal graphs, called minimum temporal paths. Computing these minimum temporal paths is challenging as subpaths of a “shortest” path may not be “shortest” in a temporal graph. We propose efficient algorithms to compute minimum temporal paths and verified their efficiency using large real-world temporal graphs.
international acm sigir conference on research and development in information retrieval | 2017
Jinfeng Li; James Cheng; Fan Yang; Yuzhen Huang; Yunjian Zhao; Xiao Yan; Ruihao Zhao
Locality Sensitive Hashing (LSH) algorithms are widely adopted to index similar items in high dimensional space for approximate nearest neighbor search. As the volume of real-world datasets keeps growing, it has become necessary to develop distributed LSH solutions. Implementing a distributed LSH algorithm from scratch requires high development costs, thus most existing solutions are developed on general-purpose platforms such as Hadoop and Spark. However, we argue that these platforms are both hard to use for programming LSH algorithms and inefficient for LSH computation. We propose LoSHa, a distributed computing framework that reduces the development cost by designing a tailor-made, general programming interface and achieves high efficiency by exploring LSH-specific system implementation and optimizations. We show that many LSH algorithms can be easily expressed in LoSHas API. We evaluate LoSHa and also compare with general-purpose platforms on the same LSH algorithms. Our results show that LoSHas performance can be an order of magnitude faster, while the implementations on LoSHa are even more intuitive and require few lines of code.
arXiv: Distributed, Parallel, and Cluster Computing | 2016
Da Yan; Yuzhen Huang; James Cheng; Huanhuan Wu
IEEE Transactions on Parallel and Distributed Systems | 2018
Da Yan; Yuzhen Huang; Miao Liu; Hongzhi Chen; James Cheng; Huanhuan Wu; Chengcui Zhang
very large data bases | 2018
Yuzhen Huang; Tatiana Jin; Yidi Wu; Zhenkun Cai; Xiao Yan; Fan Yang; Jinfeng Li; Yuying Guo; James Cheng