Is this you? Create Your Porfile

Yufei Ding

North Carolina State University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yufei Ding is active.

Explore More

Publication

Featured researches published by Yufei Ding.

programming language design and implementation | 2015

Autotuning algorithmic choice for input sensitivity

Yufei Ding; Jason Ansel; Kalyan Veeramachaneni; Xipeng Shen; Una-May O’Reilly; Saman P. Amarasinghe

A daunting challenge faced by program performance autotuning is input sensitivity, where the best autotuned configuration may vary with different input sets. This paper presents a novel two-level input learning algorithm to tackle the challenge for an important class of autotuning problems, algorithmic autotuning. The new approach uses a two-level input clustering method to automatically refine input grouping, feature selection, and classifier construction. Its design solves a series of open issues that are particularly essential to algorithmic autotuning, including the enormous optimization space, complex influence by deep input features, high cost in feature extraction, and variable accuracy of algorithmic choices. Experimental results show that the new solution yields up to a 3x speedup over using a single configuration for all inputs, and a 34x speedup over a traditional one-level method for addressing input sensitivity in program optimizations.

architectural support for programming languages and operating systems | 2014

Finding the limit: examining the potential and complexity of compilation scheduling for JIT-based runtime systems

Yufei Ding; Mingzhou Zhou; Zhijia Zhao; Sarah Eisenstat; Xipeng Shen

This work aims to find out the full potential of compilation scheduling for JIT-based runtime systems. Compilation scheduling determines the order in which the compilation units (e.g., functions) in a program are to be compiled or recompiled. It decides when what versions of the units are ready to run, and hence affects performance. But it has been a largely overlooked direction in JIT-related research, with some fundamental questions left open: How significant compilation scheduling is for performance, how good the scheduling schemes employed by existing runtime systems are, and whether a great potential exists for improvement. This study proves the strong NP-completeness of the problem, proposes a heuristic algorithm that yields near optimal schedules, examines the potential of two current scheduling schemes empirically, and explores the relations with JIT designs. It provides the first principled understanding to the complexity and potential of compilation scheduling, shedding some insights for JIT-based runtime system improvement.

very large data bases | 2015

TOP: a framework for enabling algorithmic optimizations for distance-related problems

Yufei Ding; Xipeng Shen; Madanlal Musuvathi; Todd Mytkowicz

Computing distances among data points is an essential part of many important algorithms in data analytics, graph analysis, and other domains. In each of these domains, developers have spent significant manual effort optimizing algorithms, often through novel applications of the triangle equality, in order to minimize the number of distance computations in the algorithms. In this work, we observe that many algorithms across these domains can be generalized as an instance of a generic distance-related abstraction. Based on this abstraction, we derive seven principles for correctly applying the triangular inequality to optimize distance-related algorithms. Guided by the findings, we develop T riangular OP timizer (TOP), the first software framework that is able to automatically produce optimized algorithms that either matches or outperforms manually designed algorithms for solving distance-related problems. TOP achieves up to 237x speedups and 2.5X on average.

symposium on code generation and optimization | 2013

Profmig: A framework for flexible migration of program profiles across software versions

Mingzhou Zhou; Bo Wu; Yufei Ding; Xipeng Shen

Offline program profiling is costly, especially when software update is frequent. In this paper, we initiate a systematic exploration in cross-version program profile migration, which tries to effectively reuse the valid part of the behavior profiles of an old version of a software for a new version. We explore the effects imposed on profile reusability by the various factors in program behaviors, profile formats, and impact analysis, and introduce ProfMig, a framework for flexible migrations of various profiles. We demonstrate the effectiveness of the techniques on migrating loop trip-count profiles and dynamic call graphs. The migration saves significant (48-67% on average) profiling time with less than 10% accuracy compromised for most programs.

conference on object oriented programming systems languages and applications | 2017

GLORE: generalized loop redundancy elimination upon LER-notation

Yufei Ding; Xipeng Shen

This paper presents GLORE, a novel approach to enabling the detection and removal of large-scoped redundant computations in nested loops. GLORE works on LER-notation, a new representation of computations in both regular and irregular loops. Together with a set of novel algorithms, it makes GLORE able to systematically consider computation reordering at both the expression level and the loop level in a unified manner. GLORE shows an applicability much broader than prior methods have, and frequently lowers the computational complexities of some nested loops that are elusive to prior optimization techniques, producing significantly larger speedups.

conference on object-oriented programming systems, languages, and applications | 2014

Call sequence prediction through probabilistic calling automata

Zhijia Zhao; Bo Wu; Mingzhou Zhou; Yufei Ding; Jianhua Sun; Xipeng Shen; Youfeng Wu

Predicting a sequence of upcoming function calls is important for optimizing programs written in modern managed languages (e.g., Java, Javascript, C#.) Existing function call predictions are mainly built on statistical patterns, suitable for predicting a single call but not a sequence of calls. This paper presents a new way to enable call sequence prediction, which exploits program structures through Probabilistic Calling Automata (PCA), a new program representation that captures both the inherent ensuing relations among function calls, and the probabilistic nature of execution paths. It shows that PCA-based prediction outperforms existing predictions, yielding substantial speedup when being applied to guide Just-In-Time compilation. By enabling accurate, efficient call sequence prediction for the first time, PCA-based predictors open up many new opportunities for dynamic program optimizations.

programming language design and implementation | 2017

Generalizations of the theory and deployment of triangular inequality for compiler-based strength reduction

Yufei Ding; Lin Ning; Hui Guan; Xipeng Shen

Triangular Inequality (TI) has been used in many manual algorithm designs to achieve good efficiency in solving some distance calculation-based problems. This paper presents our generalization of the idea into a compiler optimization technique, named TI-based strength reduction. The generalization consists of three parts. The first is the establishment of the theoretic foundation of this new optimization via the development of a new form of TI named Angular Triangular Inequality, along with several fundamental theorems. The second is the revealing of the properties of the new forms of TI and the proposal of guided TI adaptation, a systematic method to address the difficulties in effective deployments of TI optimizations. The third is an integration of the new optimization technique in an open-source compiler. Experiments on a set of data mining and machine learning algorithms show that the new technique can speed up the standard implementations by as much as 134X and 46X on average for distance-related problems, outperforming previous TI-based optimizations by 2.35X on average. It also extends the applicability of TI-based optimizations to vector related problems, producing tens of times of speedup.

international conference on data engineering | 2017

Sweet KNN: An Efficient KNN on GPU through Reconciliation between Redundancy Removal and Regularity

Guoyang Chen; Yufei Ding; Xipeng Shen

Finding the k nearest neighbors of a query point or a set of query points (KNN) is a fundamental problem in many application domains. It is expensive to do. Prior efforts in improving its speed have followed two directions with conflicting considerations: One tries to minimize the redundant distance computations but often introduces irregularities into computations, the other tries to exploit the regularity in computations to best exert the power of GPU-like massively parallel processors, which often introduces even extra distance computations. This work gives a detailed study on how to effectively combine the strengths of both approaches. It manages to reconcile the polar opposite effects of the two directions through elastic algorithmic designs, adaptive runtime configurations, and a set of careful implementation-level optimizations. The efforts finally lead to a new KNN on GPU named Sweet KNN, the first high-performance triangular-inequality-based KNN on GPU that manages to reach a sweet point between redundancy minimization and regularity preservation for various datasets. Experiments on a set of datasets show that Sweet KNN outperforms existing GPU implementations on KNN by up to 120X (11X on average).

international conference on machine learning | 2015