Robert Niewiadomski | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert Niewiadomski is active.

Explore More

Publication

Featured researches published by Robert Niewiadomski.

acm symposium on parallel algorithms and architectures | 2007

Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms

Timothy Furtak; José Nelson Amaral; Robert Niewiadomski

Most contemporary processors offer some version of Single Instruction Multiple Data (SIMD) machinery - vector registers and instructions to manipulate data stored in such registers. The central idea of this paper is to use these SIMD resources to improve the performance of the tail of recursive sorting algorithms. When the number of elements to be sorted reaches a set threshold, data is loaded into the vector registers, manipulated in-register, and the result stored back to memory. Three implementations of sorting with two different SIMD machineries - x86-64s SSE2 and G5s AltiVec - demonstrate that this idea delivers significant speed improvements. The improvements provided are orthogonal to the gains obtained through empirical search for a suitable sorting algorithm [11]. When integrated with the Dynamically Tuned Sorting Library (DTSL) this new code generation strategy reduces the time spent by DTSL up to 22% for moderately-sized arrays, with greater relative reductions for small arrays. Wall-clock performance of d-heaps is improved by up to 39% using a similar technique.

international conference on parallel processing | 2006

A Parallel External-Memory Frontier Breadth-First Traversal Algorithm for Clusters of Workstations

Robert Niewiadomski; José Nelson Amaral; Robert C. Holte

This paper presents a parallel external-memory algorithm for performing a breadth-first traversal of an implicit graph on a cluster of workstations. The algorithm is a parallel version of the sorting-based external-memory frontier breadth-first traversal with delayed duplicate detection algorithm. The algorithm distributes the workload according to intervals that are computed at runtime via a sampling-based process. We present an experimental evaluation of the algorithm where we compare its performance to that of its sequential counterpart on the implicit graphs of two classic planning problems. The speedups attained by the algorithm over its sequential counterpart are consistently near linear and frequently above linear. Analysis reveals that the algorithm is proficient at distributing the workload and that increasing the number of samples obtained by the sampling-based process improves workload distribution. Analysis also reveals that the algorithm benefits from the caching of external memory in internal memory that is done by the operating system

computing and combinatorics conference | 2004

A space-efficient algorithm for sequence alignment with inversions and reversals

Zhi-Zhong Chen; Yong Gao; Guohui Lin; Robert Niewiadomski; Yang Wang; Junfeng Wu

A dynamic programming algorithm to find an optimal alignment for a pair of DNA sequences has been described by Schoniger and Waterman. The alignments use not only substitutions, insertions, and deletions of single nucleotides, but also inversions, which are the reversed complements, of substrings of the sequences. With the restriction that the inversions are pairwise non-intersecting, their proposed algorithm runs in O(n2m2) time and consumes O(n2m2) space, where n and m are the lengths of the input sequences, respectively. We develop a space-efficient algorithm to compute such an optimal alignment which consumes only O(nm) space within the same amount of time. Our algorithm enables the computation for a pair of DNA sequences of length up to 10,000 to be carried out on an ordinary desktop computer. Simulation study is conducted to verify some biological facts about gene shuffling across species.

ACM Journal of Experimental Algorithms | 2004

A performance study of data layout techniques for improving data locality in refinement-based pathfinding

Robert Niewiadomski; José Nelson Amaral; Robert C. Holte

The widening gap between processor speed and memory latency increases the importance of crafting data structures and algorithms to exploit temporal and spatial locality. Refinement-based pathfinding algorithms, such as Classic Refinement (CR), find quality paths in very large sparse graphs where traditional search techniques fail to generate paths in acceptable time. In this paper, we present a performance evaluation study of three simple data structure transformations aimed at improving the data reference locality of CR. These transformations are robust to changes in computer architecture and the degree of compiler optimization. We test our alternative designs on four contemporary architectures, using two compilers for each machine. In our experiments, the application of these techniques results in performance improvements of up to 67% with consistent improvements above 15%. Analysis reveals that these improvements stem from improved data reference locality at the page level and to a lesser extent at the cache line level.

ieee international conference on high performance computing, data, and analytics | 2003

Crafting Data Structures: A Study of Reference Locality in Refinement-Based Pathfinding

Robert Niewiadomski; José Nelson Amaral; Robert C. Holte

The widening gap between processor speed and memory latency increases the importance of crafting data structures and algorithms to exploit temporal and spatial locality. Refinement-based pathfinding algorithms, such as Classic Refinement, find near-optimal paths in very large sparse graphs where traditional search techniques fail to generate paths in acceptable time. In this paper we present a performance evaluation study of three simple data structure transformation oriented techniques aimed at improving the data reference locality of Classic Refinement. In our experiments these techniques improved data reference locality resulting in consistently positive performance improvements upwards of 51.2%. In addition, these techniques appear to be orthogonal to compiler optimizations and robust with respect to hardware architecture.

computing and combinatorics conference | 2003

A space efficient algorithm for sequence alignment with inversions

Yong Gao; Junfeng Wu; Robert Niewiadomski; Yang Wang; Zhi-Zhong Chen; Guohui Lin

A dynamic programming algorithm to find an optimal alignment for a pair of DNA sequences has been described by Schoniger and Waterman. The alignments use not only substitutions, insertions, and deletions of single nucleotides, but also inversions, which are the reversed complements, of substrings of the sequences. With the restriction that the inversions are pairwise non-intersecting, their proposed algorithm runs in O(n2m2) time and consumes O(n2m2) space, where n and m are the lengths of the input sequences respectively. We develop a space efficient algorithm to compute such an optimal alignment which consumes only O(nm) space within the same amount of time. Our algorithm enables the computation for a pair of DNA sequences of length up to 10,000 to be carried out on an ordinary desktop computer. Simulation study is conducted to verify some biological facts about gene shuffling across species.

international conference on parallel processing | 2008

The MAP3S Static-and-Regular Mesh Simulation and Wavefront Parallel-Programming Patterns

Robert Niewiadomski; José Nelson Amaral; Duane Szafron

This paper presents the simulation and wavefront parallel-programming patterns of the MAP3S pattern-based parallel programming system for distributed-memory environments. Both patterns target iterative computations on static-and-regular meshes. In addition to providing performance-oriented features, such as asynchronous communication and distribution of the computational workload that is tailored to fit the computation, the patterns also provide usability-oriented features, such as direct mesh-access, mesh memory-footprint distribution, and a versatile data-dependency specification scripting-language. Parallel programs developed using MAP3S achieve significant performance gains and capability enhancements on both low-end and high-end interconnect-equipped distributed-memory systems.

Archive | 2008

Effective Bidirectional A* with Frontier Search and External-Memory Utilization

Robert Niewiadomski; José Nelson Amaral; Robert C. Holte

We present an advanced Bidirectional A* algorithm featuring an application of Frontier Search and a strategy for the performance-efficient utilization of External Memory. We present the results of an experimental evaluation demonstrating that this algorithm is capable of tackling exceptionally large state spaces while consuming significantly less time and space than its A* counterpart. For instance, in solving difficult instances of the 5-by-5 Sliding-Tile Puzzle and the 4-peg Towers-of-Hanoi problems, using additive pattern-database heuristics, the typical reductions in timeand space-consumption are in the range of one to two orders of magnitude.

national conference on artificial intelligence | 2006