Is this you? Create Your Porfile

Yuzhen Huang

The Chinese University of Hong Kong

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yuzhen Huang is active.

Explore More

Publication

Featured researches published by Yuzhen Huang.

international conference on big data | 2015

Core decomposition in large temporal graphs

Huanhuan Wu; James Cheng; Yi Lu; Yiping Ke; Yuzhen Huang; Da Yan; Hejun Wu

Core decomposition has been applied widely in the visualization and analysis of massive networks. However, existing studies of core decomposition were only limited to non-temporal graphs, while many real-world graphs can be naturally modeled as temporal graphs (e.g., the interaction between users at different time in online social networks, the phone call or messaging records between friends over time, etc.). In this paper, we define the problem of core decomposition in a temporal graph, propose efficient distributed algorithms to compute the cores in massive temporal graphs, and discuss how the technique can be used in temporal graph analysis.

international conference on data engineering | 2016

Reachability and time-based path queries in temporal graphs

Huanhuan Wu; Yuzhen Huang; James Cheng; Jinfeng Li; Yiping Ke

A temporal graph is a graph in which vertices communicate with each other at specific time, e.g., A calls B at 11 a.m. and talks for 7 minutes, which is modeled by an edge from A to B with starting time “11 a.m.” and duration “7 mins”. Temporal graphs can be used to model many networks with time-related activities, but efficient algorithms for analyzing temporal graphs are severely inadequate. We study fundamental problems such as answering reachability and time-based path queries in a temporal graph, and propose an efficient indexing technique specifically designed for processing these queries in a temporal graph. Our results show that our method is efficient and scalable in both index construction and query processing.

very large data bases | 2017

LFTF: a framework for efficient tensor analytics at scale

Fan Yang; Fanhua Shang; Yuzhen Huang; James Cheng; Jinfeng Li; Yunjian Zhao; Ruihao Zhao

Tensors are higher order generalizations of matrices to model multi-aspect data, e.g., a set of purchase records with the schema (user_id, product_id, timestamp, feedback). Tensor factorization is a powerful technique for generating a model from a tensor, just like matrix factorization generates a model from a matrix, but with higher accuracy and richer information as more attributes are available in a higher- order tensor than a matrix. The data model obtained by tensor factorization can be used for classification, recommendation, anomaly detection, and so on. Though having a broad range of applications, tensor factorization has not been popularly applied compared with matrix factorization that has been widely used in recommender systems, mainly due to the high computational cost and poor scalability of existing tensor factorization methods. Efficient and scalable tensor factorization is particularly challenging because real world tensor data are mostly sparse and massive. In this paper, we propose a novel distributed algorithm, called Lock-Free Tensor Factorization (LFTF), which significantly improves the efficiency and scalability of distributed tensor factorization by exploiting asynchronous execution in a re-formulated problem. Our experiments show that LFTF achieves much higher CPU and network throughput than existing methods, converges at least 17 times faster and scales to much larger datasets.

international conference on management of data | 2017

The Best of Both Worlds: Big Data Programming with Both Productivity and Performance

Fan Yang; Yuzhen Huang; Yunjian Zhao; Jinfeng Li; Guanxian Jiang; James Cheng

Coarse-grained operators such as map and reduce have been widely used for large-scale data processing. While they are easy to master, over-simplified APIs sometimes hinder programmers from fine-grained control on how computation is performed and hence designing more efficient algorithms. On the other hand, resorting to domain-specific languages (DSLs) is also not a practical solution, since programmers may need to learn how to use many systems that can be very different from each other, and the use of low-level tools may even result in bug-prone programming. In [7] our prior work, we proposed Husky which provides a highly expressive API to solve the above dilemma. It allows developers to program in a variety of patterns, such as MapReduce, GAS, vertex-centric programs, and even asynchronous machine learning. While the Husky C++ engine provides great performance, in this demo proposal we introduce PyHusky and ScHusky, which allow users (e.g., data scientists) without system knowledge and low-level programming skills to leverage the performance of Husky and build high-level applications with ease using Python and Scala.

international conference on big data | 2016

A comparison of general-purpose distributed systems for data processing

Jinfeng Li; James Cheng; Yunjian Zhao; Fan Yang; Yuzhen Huang; Haipeng Chen; Ruihao Zhao

General-purpose distributed systems for data processing become popular in recent years due to the high demand from industry for big data analytics. However, there is a lack of comprehensive comparison among these systems and detailed analysis on their performance. In this paper, we conduct an extensive performance study on four state-of-the-art general-purpose distributed computing systems. Our results reveal useful insights on the design and implementation, which help the improvement of existing systems and the development of better new systems.

IEEE Transactions on Knowledge and Data Engineering | 2016

Efficient Algorithms for Temporal Path Computation

Huanhuan Wu; James Cheng; Yiping Ke; Silu Huang; Yuzhen Huang; Hejun Wu

Shortest path is a fundamental graph problem with numerous applications. However, the concept of classic shortest path is insufficient. In this paper, we study various concepts of “shortest” path in temporal graphs, called minimum temporal paths. Computing these minimum temporal paths is challenging as subpaths of a “shortest” path may not be “shortest” in a temporal graph. We propose efficient algorithms to compute minimum temporal paths and verified their efficiency using large real-world temporal graphs.

international acm sigir conference on research and development in information retrieval | 2017

LoSHa: A General Framework for Scalable Locality Sensitive Hashing

Jinfeng Li; James Cheng; Fan Yang; Yuzhen Huang; Yunjian Zhao; Xiao Yan; Ruihao Zhao

Locality Sensitive Hashing (LSH) algorithms are widely adopted to index similar items in high dimensional space for approximate nearest neighbor search. As the volume of real-world datasets keeps growing, it has become necessary to develop distributed LSH solutions. Implementing a distributed LSH algorithm from scratch requires high development costs, thus most existing solutions are developed on general-purpose platforms such as Hadoop and Spark. However, we argue that these platforms are both hard to use for programming LSH algorithms and inefficient for LSH computation. We propose LoSHa, a distributed computing framework that reduces the development cost by designing a tailor-made, general programming interface and achieves high efficiency by exploring LSH-specific system implementation and optimizations. We show that many LSH algorithms can be easily expressed in LoSHas API. We evaluate LoSHa and also compare with general-purpose platforms on the same LSH algorithms. Our results show that LoSHas performance can be an order of magnitude faster, while the implementations on LoSHa are even more intuitive and require few lines of code.

arXiv: Distributed, Parallel, and Cluster Computing | 2016