Ce Zhang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ce Zhang is active.

Explore More

Publication

Featured researches published by Ce Zhang.

international conference on management of data | 2017

Heterogeneity-aware Distributed Parameter Servers

Jiawei Jiang; Bin Cui; Ce Zhang; Lele Yu

We study distributed machine learning in heterogeneous environments in this work. We first conduct a systematic study of existing systems running distributed stochastic gradient descent; we find that, although these systems work well in homogeneous environments, they can suffer performance degradation, sometimes up to 10x, in heterogeneous environments where stragglers are common because their synchronization protocols cannot fit a heterogeneous setting. Our first contribution is a heterogeneity-aware algorithm that uses a constant learning rate schedule for updates before adding them to the global parameter. This allows us to suppress stragglers harm on robust convergence. As a further improvement, our second contribution is a more sophisticated learning rate schedule that takes into consideration the delayed information of each update. We theoretically prove the valid convergence of both approaches and implement a prototype system in the production cluster of our industrial partner Tencent Inc. We validate the performance of this prototype using a range of machine-learning workloads. Our prototype is 2-12x faster than other state-of-the-art systems, such as Spark, Petuum, and TensorFlow; and our proposed algorithm takes up to 6x fewer iterations to converge.

Monthly Notices of the Royal Astronomical Society | 2017

Generative adversarial networks recover features in astrophysical images of galaxies beyond the deconvolution limit

Kevin Schawinski; Ce Zhang; Hantian Zhang; Lucas Fowler; Gokula Krishnan Santhanam

Observations of astrophysical objects such as galaxies are limited by various sources of random and systematic noise from the sky background, the optical system of the telescope and the detector used to record the data. Conventional deconvolution techniques are limited in their ability to recover features in imaging data by the Shannon-Nyquist sampling theorem. Here we train a generative adversarial network (GAN) on a sample of

very large data bases | 2017

MLog: towards declarative in-database machine learning

Xupeng Li; Bin Cui; Yiru Chen; Wentao Wu; Ce Zhang

4,550

very large data bases | 2017

An experimental evaluation of simrank-based similarity search algorithms

Zhipeng Zhang; Yingxia Shao; Bin Cui; Ce Zhang

images of nearby galaxies at

international conference on data engineering | 2017

TencentBoost: A Gradient Boosting Tree System with Parameter Server

Jie Jiang; Jiawei Jiang; Bin Cui; Ce Zhang

0.01<z<0.02

field programmable custom computing machines | 2017

FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off

Kaan Kara; Dan Alistarh; Gustavo Alonso; Onur Mutlu; Ce Zhang

from the Sloan Digital Sky Survey and conduct

international conference on big data | 2016

Android malware development on public malware scanning platforms: A large-scale data-driven study

Heqing Huang; Cong Zheng; Junyuan Zeng; Wu Zhou; Sencun Zhu; Peng Liu; Suresh Chari; Ce Zhang

10times

very large data bases | 2017

LDA*: a robust and large-scale topic modeling system

Lele Yut; Ce Zhang; Yingxia Shao; Bin Cui

cross validation to evaluate the results. We present a method using a GAN trained on galaxy images that can recover features from artificially degraded images with worse seeing and higher noise than the original with a performance which far exceeds simple deconvolution. The ability to better recover detailed features such as galaxy morphology from low-signal-to-noise and low angular resolution imaging data significantly increases our ability to study existing data sets of astrophysical objects as well as future observations with observatories such as the Large Synoptic Sky Telescope (LSST) and the Hubble and James Webb space telescopes.

international conference on management of data | 2017

An Overreaction to the Broken Machine Learning Abstraction: The ease.ml Vision

Ce Zhang; Wentao Wu; Tian Li

We demonstrate MLog, a high-level language that integrates machine learning into data management systems. Unlike existing machine learning frameworks (e.g., TensorFlow, Theano, and Caffe), MLog is declarative, in the sense that the system manages all data movement, data persistency, and machine-learning related optimizations (such as data batching) automatically. Our interactive demonstration will show audience how this is achieved based on the novel notion of tensoral views (TViews), which are similar to relational views but operate over tensors with linear algebra. With MLog, users can succinctly specify not only simple models such as SVM (in just two lines), but also sophisticated deep learning models that are not supported by existing in-database analytics systems (e.g., MADlib, PAL, and SciDB), as a series of cascaded TViews. Given the declarative nature of MLog, we further demonstrate how query/program optimization techniques can be leveraged to translate MLog programs into native TensorFlow programs. The performance of the automatically generated Tensor-Flow programs is comparable to that of hand-optimized ones.

bioRxiv | 2018

Generative adversarial networks as a tool to recover structural information from cryo-electron microscopy data

Min Su; Hantian Zhang; Kevin Schawinski; Ce Zhang; Michael A. Cianfrocco

Given a graph, SimRank is one of the most popular measures of the similarity between two vertices. We focus on efficiently calculating SimRank, which has been studied intensively over the last decade. This has led to many algorithms that efficiently calculate or approximate SimRank being proposed by researchers. Despite these abundant research efforts, there is no systematic comparison of these algorithms. In this paper, we conduct a study to compare these algorithms to understand their pros and cons. n nWe first introduce a taxonomy for different algorithms that calculate SimRank and classify each algorithm into one of the following three classes, namely, iterative-, non-iterative-, and random walk-based method. We implement ten algorithms published from 2002 to 2015, and compare them using synthetic and real-world graphs. To ensure the fairness of our study, our implementations use the same data structure and execution framework, and we try our best to optimize each of these algorithms. Our study reveals that none of these algorithms dominates the others: algorithms based on iterative method often have higher accuracy while algorithms based on random walk can be more scalable. One noniterative algorithm has good effectiveness and efficiency on graphs with medium size. Thus, depending on the requirements of different applications, the optimal choice of algorithms differs. This paper provides an empirical guideline for making such choices.

Explore More