Arindam Saha

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arindam Saha is active.

Explore More

Publication

Featured researches published by Arindam Saha.

annual simulation symposium | 1995

A simulator for real-time parallel processing architectures

Arindam Saha

A time-driven, flit-based, wormhole-routed, parallel processor network simulator has been designed in C with a user-friendly graphical user interface (GUI). To accommodate the unique requirements of real-time networks, the simulator is based on prioritized queues supporting various resource allocation policies. In the simulator, special care is taken to prevent various possible kinds of overlaps (in time) and deadlocks. As a consequence of real-time systems, all messages are associated with priorities and every virtual channel is associated with the priority of the message it stores. The simulator contains a simple test for the convergence of the average latency, and the throughput is always monitored to verify that it converges to the applied load. A candidate network function is accurately defined and this characterization is detailed enough to be simulated. The simulator is used to evaluate a set of communication characteristics that are deemed desirable for real-time parallel processors with respect to performance.<<ETX>>

southeastcon | 1993

Design and FPGA implementation of efficient integer arithmetic algorithms

Arindam Saha; R. Krishnamurthy

The design and implementation of various integer arithmetic algorithms are presented. The design process and the various criteria involved in the building of an efficient O(N) 32-bit bit stream-systolic multiplier chip from the algorithmic level to the final implementation stage are described. An entirely different approach is used to implement integer division using the Chinese remaindering theorem. This algorithm generates the first N significant bits of the reciprocal of a number in O(log N) bit steps. This algorithm is asymptotically faster than the implementations currently available for integer division. The scheme is attractive as it forms a basis for the implementation of some more arithmetic functions, like the generation of a power series and multiplication of N N-bit operands in O(log N) time steps. The evaluation and implementation of the modulus function are considered.<<ETX>>

rapid system prototyping | 1994

Some design issues in multi-chip FPGA implementation of DSP algorithms

Arindam Saha; Rangasayee Krishnamurthy

Field programmable gate arrays (FPGAs) provide an innovative and flexible platform to implement and evaluate digital signal processing (DSP) applications. A CAD design methodology which is used to implement DSP algorithms is presented. An introduction is given to the various issues involved in the multi-chip partitioning of large DSP implementations, and approaches towards efficient auto-partitioners are also discussed in detail. The design and implementation of an 8-point 1D discrete cosine transform (DCT) and its inverse (IDCT) on a processor with FPGAs is presented in this paper, as an illustrative example of a typical DSP algorithm. The processor uses 16-bit precision, is implemented on six Xilinx 4000 type FPGAs and operates at 40 MHz.<<ETX>>

IEEE Transactions on Consumer Electronics | 1995

Parallel programmable algorithm and architecture for real-time motion estimation of various video applications

Arindam Saha; Raja Neogi

This paper describes a parallel architecture for a new motion estimation algorithm that combines full search block matching with sparse search. Our solution caters to a wide variety of applications with various video data rates and various search ranges. Hence our architecture is programmable. Our solution also estimates the motion vector in real-time by using parallel processing. The multigrid algorithm works in maximum three sequential passes. Detailed data flow diagrams show the exact data use at every processor at every cycle time. This data flow is formalized with the derivation of exact analytic expressions. The 64-processor architecture consists of four clusters of 16 processors each, all working concurrently with each cluster working in a pipelined fashion. Novel hardware structures are designed to meet the data flow, requirements of the different passes. Enormous data reuse is performed to minimize the on-chip data storage. The novel VLSI architecture can easily be implemented on a single chip.

IEEE Transactions on Consumer Electronics | 1995

Embedded parallel divide-and-conquer video decompression algorithm and architecture for HDTV applications

Raja Neogi; Arindam Saha

DCT/IDCT based source coding and decoding techniques are widely accepted in HDTV systems and other MPEG based applications. We propose a new direct 2-D IDCT algorithm based on the parallel divide-and-conquer approach. The algorithm distributes computation by considering one transformed coefficient at a time and doing partial computation and updating as every coefficient arrives. A novel parallel and fully pipelined architecture with an effective processing time of one cycle per pixel for an N/spl times/N size block is designed to implement the algorithm. An unique feature of the architecture is that it integrates inverse-shuffling, inverse-quantization, inverse-source-coding and motion-compensation into a single compact data-path. The entire block of pixel values are sampled in a single cycle for post processing after decompression. We use only (N/2(N/2+1))/2 multipliers and N/sup 2/ adders. The configuration of the adders is such that motion compensation is realized in a single cycle following decompression. >

southeastcon | 1994

Rapid prototyping and performance evaluation of recoded multipliers using FPGAs

Arindam Saha; R. Krishnamurthy

High speed recoded parallel multipliers constitute an affordable improvement compared to the serial-parallel add-shift designs. We present in this paper, a detailed discussion of the development of these recoded multiplier algorithms and their FPGA implementations. The various issues involved in the design process are highlighted. The cost-performance comparison of the various recoded multipliers is studied and a discussion on the design methodology is also presented.<<ETX>>

southeastcon | 1993

Minmax recurrences in analysis of algorithms

Arindam Saha; Meghanad D. Wagh

The solution of a challenging minmax recurrence relation is presented. This relation is derived from a model of parallel divide-and-conquer computations that incorporates the unavoidable and significant parallel processing overheads. The minmax recurrence is solved by characterizing the properties of the optimal partition sizes. It is shown that the optimal partition size, given a problem size n, is nontrivial and very different from the ad hoc n/2 value taken conventionally. It is also shown that the complexity of the algorithm reduces from O(n) to O(/spl radic/n) by choosing the optimal partition size instead of the equal partition size size at every stage of the recursion. The authors also survey some of the existing theory of minmax recurrence relations. They mention three interesting recurrences, how they are derived in the analysis of algorithms, and how they are solved by various authors.<<ETX>>

Applied Mathematics and Computation | 1996

Solutions of two minmax recurrences in parallel processing with variable recombination overhead

Arindam Saha; Meghanad D. Wagh

Abstract A variety of parallel algorithms are based on the paradigm of recursion. However, because of the partition and recombination overheads involved, such algorithms may perform poorly on real architectures. This paper deals with solutions of two minmax recurrence relations that result from optimal execution of parallel recursive algorithms in the presence of variable recombination overheads and constant partition overheads. We solve two models with the recombination overheads shown as a linear function of the partition size and the problem size, respectively. For model 1, we show that the optimal partition sizes for problem size n at any stage of the recursion can easily be found in O (log log n ) time using an O (log n ) size table. Model 2 is solved for cases k ≤ λ and k > λ , where k and λ are constants related to the parallel computing overheads. For the first case, the optimal partition size is derived to be [ n 2 ] and a closed-form solution of the complexity is exactly characterized. For the second case, the optimal partition size is found to be in the range [ (n − t) 2 ] ± 1 , where t is a nonnegative constant that depends on k and λ, and the overall complexity is shown to be O (2 λn + ( k − λt )log 2 ( n + t )). This paper develops the theory of solutions of minmax recurrences and provides some insight into this area of computational mathematics which has received little attention so far.

Proceedings of Third Workshop on Parallel and Distributed Real-Time Systems | 1995

A study of network routers for real-time parallel computers

Arindam Saha; Raja Neogi

A candidate network function is accurately defined for real-time parallel computers. A concise, time-driven, flit-based, priority-driven, wormhole-routed, network simulator has been designed. Experimentation is performed by monitoring the latency and the throughput with variations in different network parameters. Initially the destination address, message length and message priority are generated randomly with a uniform distribution. Then, various non-uniformities are introduced to mimic realistic applications. Results are plotted and analyzed.<<ETX>>

southeastcon | 1993

Solution of a parallel divide-and-conquer model in the presence of overheads

Arindam Saha; N. Muthukumar

The authors describe the model for a parallel divide-and-conquer algorithm incorporating both the symmetric and nonsymmetric overheads inherent in any parallel computing environment. An algorithm for computing optimal partitions is derived. This algorithm separates problem sizes into classes of problems that may use the same optimal partition size.<<ETX>>

Explore More

Collaboration

Dive into the Arindam Saha's collaboration.

Top Co-Authors

Meghanad D. Wagh

Lehigh University

View shared research outputs

Top Co-Authors

Raja Neogi

Motorola

View shared research outputs

Explore More

Mississippi State University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot

Dive into the research topics where Arindam Saha is active.

Publication

Featured researches published by Arindam Saha.

A simulator for real-time parallel processing architectures

Design and FPGA implementation of efficient integer arithmetic algorithms

Some design issues in multi-chip FPGA implementation of DSP algorithms

Parallel programmable algorithm and architecture for real-time motion estimation of various video applications

Embedded parallel divide-and-conquer video decompression algorithm and architecture for HDTV applications

Rapid prototyping and performance evaluation of recoded multipliers using FPGAs

Minmax recurrences in analysis of algorithms

Solutions of two minmax recurrences in parallel processing with variable recombination overhead

A study of network routers for real-time parallel computers

Solution of a parallel divide-and-conquer model in the presence of overheads

Collaboration

Dive into the Arindam Saha's collaboration.