Ce Zhang
ETH Zurich
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ce Zhang.
international conference on management of data | 2017
Jiawei Jiang; Bin Cui; Ce Zhang; Lele Yu
We study distributed machine learning in heterogeneous environments in this work. We first conduct a systematic study of existing systems running distributed stochastic gradient descent; we find that, although these systems work well in homogeneous environments, they can suffer performance degradation, sometimes up to 10x, in heterogeneous environments where stragglers are common because their synchronization protocols cannot fit a heterogeneous setting. Our first contribution is a heterogeneity-aware algorithm that uses a constant learning rate schedule for updates before adding them to the global parameter. This allows us to suppress stragglers harm on robust convergence. As a further improvement, our second contribution is a more sophisticated learning rate schedule that takes into consideration the delayed information of each update. We theoretically prove the valid convergence of both approaches and implement a prototype system in the production cluster of our industrial partner Tencent Inc. We validate the performance of this prototype using a range of machine-learning workloads. Our prototype is 2-12x faster than other state-of-the-art systems, such as Spark, Petuum, and TensorFlow; and our proposed algorithm takes up to 6x fewer iterations to converge.
Monthly Notices of the Royal Astronomical Society | 2017
Kevin Schawinski; Ce Zhang; Hantian Zhang; Lucas Fowler; Gokula Krishnan Santhanam
Observations of astrophysical objects such as galaxies are limited by various sources of random and systematic noise from the sky background, the optical system of the telescope and the detector used to record the data. Conventional deconvolution techniques are limited in their ability to recover features in imaging data by the Shannon-Nyquist sampling theorem. Here we train a generative adversarial network (GAN) on a sample of
very large data bases | 2017
Xupeng Li; Bin Cui; Yiru Chen; Wentao Wu; Ce Zhang
4,550
very large data bases | 2017
Zhipeng Zhang; Yingxia Shao; Bin Cui; Ce Zhang
images of nearby galaxies at
international conference on data engineering | 2017
Jie Jiang; Jiawei Jiang; Bin Cui; Ce Zhang
0.01<z<0.02
field programmable custom computing machines | 2017
Kaan Kara; Dan Alistarh; Gustavo Alonso; Onur Mutlu; Ce Zhang
from the Sloan Digital Sky Survey and conduct
international conference on big data | 2016
Heqing Huang; Cong Zheng; Junyuan Zeng; Wu Zhou; Sencun Zhu; Peng Liu; Suresh Chari; Ce Zhang
10times
very large data bases | 2017
Lele Yut; Ce Zhang; Yingxia Shao; Bin Cui
cross validation to evaluate the results. We present a method using a GAN trained on galaxy images that can recover features from artificially degraded images with worse seeing and higher noise than the original with a performance which far exceeds simple deconvolution. The ability to better recover detailed features such as galaxy morphology from low-signal-to-noise and low angular resolution imaging data significantly increases our ability to study existing data sets of astrophysical objects as well as future observations with observatories such as the Large Synoptic Sky Telescope (LSST) and the Hubble and James Webb space telescopes.
international conference on management of data | 2017
Ce Zhang; Wentao Wu; Tian Li
We demonstrate MLog, a high-level language that integrates machine learning into data management systems. Unlike existing machine learning frameworks (e.g., TensorFlow, Theano, and Caffe), MLog is declarative, in the sense that the system manages all data movement, data persistency, and machine-learning related optimizations (such as data batching) automatically. Our interactive demonstration will show audience how this is achieved based on the novel notion of tensoral views (TViews), which are similar to relational views but operate over tensors with linear algebra. With MLog, users can succinctly specify not only simple models such as SVM (in just two lines), but also sophisticated deep learning models that are not supported by existing in-database analytics systems (e.g., MADlib, PAL, and SciDB), as a series of cascaded TViews. Given the declarative nature of MLog, we further demonstrate how query/program optimization techniques can be leveraged to translate MLog programs into native TensorFlow programs. The performance of the automatically generated Tensor-Flow programs is comparable to that of hand-optimized ones.
bioRxiv | 2018
Min Su; Hantian Zhang; Kevin Schawinski; Ce Zhang; Michael A. Cianfrocco
Given a graph, SimRank is one of the most popular measures of the similarity between two vertices. We focus on efficiently calculating SimRank, which has been studied intensively over the last decade. This has led to many algorithms that efficiently calculate or approximate SimRank being proposed by researchers. Despite these abundant research efforts, there is no systematic comparison of these algorithms. In this paper, we conduct a study to compare these algorithms to understand their pros and cons. n nWe first introduce a taxonomy for different algorithms that calculate SimRank and classify each algorithm into one of the following three classes, namely, iterative-, non-iterative-, and random walk-based method. We implement ten algorithms published from 2002 to 2015, and compare them using synthetic and real-world graphs. To ensure the fairness of our study, our implementations use the same data structure and execution framework, and we try our best to optimize each of these algorithms. Our study reveals that none of these algorithms dominates the others: algorithms based on iterative method often have higher accuracy while algorithms based on random walk can be more scalable. One noniterative algorithm has good effectiveness and efficiency on graphs with medium size. Thus, depending on the requirements of different applications, the optimal choice of algorithms differs. This paper provides an empirical guideline for making such choices.