Featured Researches

Mathematical Software

A 55-line code for large-scale parallel topology optimization in 2D and 3D

This paper presents a 55-line code written in python for 2D and 3D topology optimization (TO) based on the open-source finite element computing software (FEniCS), equipped with various finite element tools and solvers. PETSc is used as the linear algebra back-end, which results in significantly less computational time than standard python libraries. The code is designed based on the popular solid isotropic material with penalization (SIMP) methodology. Extensions to multiple load cases, different boundary conditions, and incorporation of passive elements are also presented. Thus, this implementation is the most compact implementation of SIMP based topology optimization for 3D as well as 2D problems. Utilizing the concept of Euclidean distance matrix to vectorize the computation of the weight matrix for the filter, we have achieved a substantial reduction in the computational time and have also made it possible for the code to work with complex ground structure configurations. We have also presented the code's extension to large-scale topology optimization problems with support for parallel computations on complex structural configuration, which could help students and researchers explore novel insights into the TO problem with dense meshes. Appendix-A contains the complete code, and the website: \url{this https URL} also contains the complete code.

Read more
Mathematical Software

A Benchmark of Selected Algorithmic Differentiation Tools on Some Problems in Computer Vision and Machine Learning

Algorithmic differentiation (AD) allows exact computation of derivatives given only an implementation of an objective function. Although many AD tools are available, a proper and efficient implementation of AD methods is not straightforward. The existing tools are often too different to allow for a general test suite. In this paper, we compare fifteen ways of computing derivatives including eleven automatic differentiation tools implementing various methods and written in various languages (C++, F#, MATLAB, Julia and Python), two symbolic differentiation tools, finite differences, and hand-derived computation. We look at three objective functions from computer vision and machine learning. These objectives are for the most part simple, in the sense that no iterative loops are involved, and conditional statements are encapsulated in functions such as {\tt abs} or {\tt logsumexp}. However, it is important for the success of algorithmic differentiation that such `simple' objective functions are handled efficiently, as so many problems in computer vision and machine learning are of this form. Of course, our results depend on programmer skill, and familiarity with the tools. However, we contend that this paper presents an important datapoint: a skilled programmer devoting roughly a week to each tool produced the timings we present. We have made our implementations available as open source to allow the community to replicate and update these benchmarks.

Read more
Mathematical Software

A Blackbox Polynomial System Solver on Parallel Shared Memory Computers

A numerical irreducible decomposition for a polynomial system provides representations for the irreducible factors of all positive dimensional solution sets of the system, separated from its isolated solutions. Homotopy continuation methods are applied to compute a numerical irreducible decomposition. Load balancing and pipelining are techniques in a parallel implementation on a computer with multicore processors. The application of the parallel algorithms is illustrated on solving the cyclic n -roots problems, in particular for n=8,9 , and~12.

Read more
Mathematical Software

A Domain-Specific Language and Editor for Parallel Particle Methods

Domain-specific languages (DSLs) are of increasing importance in scientific high-performance computing to reduce development costs, raise the level of abstraction and, thus, ease scientific programming. However, designing and implementing DSLs is not an easy task, as it requires knowledge of the application domain and experience in language engineering and compilers. Consequently, many DSLs follow a weak approach using macros or text generators, which lack many of the features that make a DSL a comfortable for programmers. Some of these features---e.g., syntax highlighting, type inference, error reporting, and code completion---are easily provided by language workbenches, which combine language engineering techniques and tools in a common ecosystem. In this paper, we present the Parallel Particle-Mesh Environment (PPME), a DSL and development environment for numerical simulations based on particle methods and hybrid particle-mesh methods. PPME uses the meta programming system (MPS), a projectional language workbench. PPME is the successor of the Parallel Particle-Mesh Language (PPML), a Fortran-based DSL that used conventional implementation strategies. We analyze and compare both languages and demonstrate how the programmer's experience can be improved using static analyses and projectional editing. Furthermore, we present an explicit domain model for particle abstractions and the first formal type system for particle methods.

Read more
Mathematical Software

A Faster, More Intuitive RooFit

RooFit and RooStats, the toolkits for statistical modelling in ROOT, are used in most searches and measurements at the Large Hadron Collider as well as at B factories. Larger datasets to be collected at e.g. the High-Luminosity LHC will enable measurements with higher precision, but will require faster data processing to keep fitting times stable. In this work, a simplification of RooFit's interfaces and a redesign of its internal dataflow is presented. Interfaces are being extended to look and feel more STL-like to be more accessible both from C++ and Python to improve interoperability and ease of use, while maintaining compatibility with old code. The redesign of the dataflow improves cache locality and data loading, and can be used to process batches of data with vectorised SIMD computations. This reduces the time for computing unbinned likelihoods by a factor four to 16. This will allow to fit larger datasets of the future in the same time or faster than today's fits.

Read more
Mathematical Software

A Flexible Framework for Parallel Multi-Dimensional DFTs

Multi-dimensional discrete Fourier transforms (DFT) are typically decomposed into multiple 1D transforms. Hence, parallel implementations of any multi-dimensional DFT focus on parallelizing within or across the 1D DFT. Existing DFT packages exploit the inherent parallelism across the 1D DFTs and offer rigid frameworks, that cannot be extended to incorporate both forms of parallelism and various data layouts to enable some of the parallelism. However, in the era of exascale, where systems have thousand of nodes and intricate network topologies, flexibility and parallel efficiency are key aspects all multi-dimensional DFT frameworks need to have in order to map and scale the computation appropriately. In this work, we present a flexible framework, built on the Redistribution Operations and Tensor Expressions (ROTE) framework, that facilitates the development of a family of parallel multi-dimensional DFT algorithms by 1) unifying the two parallelization schemes within a single framework, 2) exploiting the two different parallelization schemes to different degrees and 3) using different data layouts to distribute the data across the compute nodes. We demonstrate the need of a versatile framework and thus a need for a family of parallel multi-dimensional DFT algorithms on the K-Computer, where we show almost linear strong scaling results for problem sizes of 1024^3 on 32k compute nodes.

Read more
Mathematical Software

A Framework for General Sparse Matrix-Matrix Multiplication on GPUs and Heterogeneous Processors

General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM implementation has to handle extra irregularity from three aspects: (1) the number of nonzero entries in the resulting sparse matrix is unknown in advance, (2) very expensive parallel insert operations at random positions in the resulting sparse matrix dominate the execution time, and (3) load balancing must account for sparse data in both input matrices. In this work we propose a framework for SpGEMM on GPUs and emerging CPU-GPU heterogeneous processors. This framework particularly focuses on the above three problems. Memory pre-allocation for the resulting matrix is organized by a hybrid method that saves a large amount of global memory space and efficiently utilizes the very limited on-chip scratchpad memory. Parallel insert operations of the nonzero entries are implemented through the GPU merge path algorithm that is experimentally found to be the fastest GPU merge approach. Load balancing builds on the number of necessary arithmetic operations on the nonzero entries and is guaranteed in all stages. Compared with the state-of-the-art CPU and GPU SpGEMM methods, our approach delivers excellent absolute performance and relative speedups on various benchmarks multiplying matrices with diverse sparsity structures. Furthermore, on heterogeneous processors, our SpGEMM approach achieves higher throughput by using re-allocatable shared virtual memory. The source code of this work is available at this https URL

Read more
Mathematical Software

A Functional Package for Automatic Solution of Ordinary Differential Equations with Spectral Methods

We present a Python module named PyCheb, to solve the ordinary differential equations by using spectral collocation method. PyCheb incorporates discretization using Chebyshev points, barycentric interpolation and iterate methods. With this Python module, users can initialize the ODEsolver class by passing attributes, including the both sides of a given differential equation, boundary conditions, and the number of Chebyshev points, which can also be generated automatically by the ideal precision, to the constructor of ODEsolver class. Then, the instance of the ODEsolver class can be used to automatically determine the resolution of the differential equation as well as generate the graph of the high-precision approximate solution. (If you have any questions, please send me an email and I will reply ASAP. e-mail:[email protected]/[email protected] http URL)

Read more
Mathematical Software

A GPU-based Multi-level Algorithm for Boundary Value Problems

A novel and scalable geometric multi-level algorithm is presented for the numerical solution of elliptic partial differential equations, specially designed to run with high occupancy of streaming processors inside Graphics Processing Units(GPUs). The algorithm consists of iterative, superposed operations on a single grid, and it is composed of two simple full-grid routines: a restriction and a coarsened interpolation-relaxation. The restriction is used to collect sources using recursive coarsened averages, and the interpolation-relaxation simultaneously applies coarsened finite-difference operators and interpolations. The routines are scheduled in a saw-like refining cycle. Convergence to machine precision is achieved repeating the full cycle using accumulated residuals and successively collecting the solution. Its total number of operations scale linearly with the number of nodes. It provides an attractive fast solver for Boundary Value Problems (BVPs), specially for simulations running entirely in the GPU. Applications shown in this work include the deformation of two-dimensional grids, the computation of three-dimensional streamlines for a singular trifoil-knot vortex and the calculation of three-dimensional electric potentials in heterogeneous dielectric media.

Read more
Mathematical Software

A High-Performance Implementation of a Robust Preconditioner for Heterogeneous Problems

We present an efficient implementation of the highly robust and scalable GenEO preconditioner in the high-performance PDE framework DUNE. The GenEO coarse space is constructed by combining low energy solutions of a local generalised eigenproblem using a partition of unity. In this paper we demonstrate both weak and strong scaling for the GenEO solver on over 15,000 cores by solving an industrially motivated problem with over 200 million degrees of freedom. Further, we show that for highly complex parameter distributions arising in certain real-world applications, established methods become intractable while GenEO remains fully effective. The purpose of this paper is two-fold: to demonstrate the robustness and high parallel efficiency of the solver and to document the technical details that are crucial to the efficiency of the code.

Read more

Ready to get started?

Join us today