Is this you? Create Your Porfile

Sachin B. Patkar

Indian Institute of Technology Bombay

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sachin B. Patkar is active.

Explore More

Publication

Featured researches published by Sachin B. Patkar.

international conference on vlsi design | 2009

FPGA Based High Performance Double-Precision Matrix Multiplication

Vinay Kumar; Siddharth Joshi; Sachin B. Patkar; H. Narayanan

We present two designs (I and II) for IEEE 754 double precision floating point matrix multiplication, an important kernel in many tile-based BLAS algorithms, optimized for implementation on high-end FPGAs. The designs, both based on the rank-1 update scheme, can handle arbitrary matrix sizes, and are able to sustain their peak performance except during an initial latency period. Through these designs, the trade-offs involved in terms of local-memory and bandwidth for an FPGA implementation are demonstrated and an analysis is presented for the optimal choice of design parameters. The designs, implemented on a Virtex-5 SX240T FPGA, scale gracefully from 1 to 40 processing elements(PEs) with a less than 1% degradation in the design frequency of 373 MHz. With 40 PEs and a design speed of 373 MHz, a sustained performance of 29.8 GFLOPS is possible with a bandwidth requirement of 750 MB/s for design-II and 5.9 GB/s for design-I.

Discrete Mathematics | 2001

Realization of set functions as cut functions of graphs and hypergraphs

Satoru Fujishige; Sachin B. Patkar

Abstract We consider the problems of realizing set functions as cut functions on graphs and hypergraphs. We give necessary and sufficient conditions for set functions to be realized as cut functions on nonnegative networks, symmetric networks and nonnegative hypernetworks. The symmetry significantly simplifies the characterization of set functions of nonnegative network type. Set functions of nonnegative hypernetwork type generalize those of nonnegative network type (cut functions of ordinary networks) and are submodular functions. For any constant integer k⩾2 , we can discern in polynomial time by a max-flow algorithm whether a given set function of order k is of nonnegative hypernetwork type.

Discrete Applied Mathematics | 2003

Improving graph partitions using submodular functions

Sachin B. Patkar; H. Narayanan

We investigate into the role of submodular functions in designing new heuristics and approximate algorithms to some NP-hard problems arising in the field of VLSI Design Automation. In particular, we design and implement efficient heuristic for improving a bipartition of a graph in the sense of ratioCut (Discrete Appl. Math. 90 (1999) 3; 29th Annual Symposium on Foundations of Computer Science, 1988, p. 422). We also design an approximate algorithm for another NP-hard problem which is a dual of the well-known NP-hard problem of finding a densest k-subgraph of a graph (see J. Algorithms 34 (2000) 203; Proceedings of the 34th Annual Symposium on Foundations of Computer Science, 1993, p. 692). Our algorithms are based on submodular function and are implementable in polynomial time using efficient network flow based subroutines. To the best of our knowledge our algorithms are the first ones to use submodular functions based approach for the problems considered here. We also describe the experimental results which provide the evidence of our heuristic for improving the ratioCut.

ieee international conference on high performance computing, data, and analytics | 2009

Acceleration of conjugate gradient method for circuit simulation using CUDA

Anirudh Maringanti; Viraj Athavale; Sachin B. Patkar

The Conjugate Gradient method is a popular iterative method to solve a system of linear equations and is used in a variety of applications. The DC Analyser is a circuit simulator built at IIT Bombay to solve large circuits containing resistances, voltage and current sources and which employs the conjugate gradient method. Current generation of graphics cards offer extremely high raw processing power and memory bandwidths compared to conventional CPUs. We have accelerated the conjugate gradient part of the DC Analyser using an Nvidia GTX 280 GPU and the new CUDA technology and successfully obtained a speedup of over 10x for the CG method and more than 4x for the entire application for very large circuits when compared to a single-threaded CPU implementation.

international conference on vlsi design | 2003

An efficient practical heuristic for good ratio-cut partitioning

Sachin B. Patkar; H. Narayanan

We present an efficient heuristic for finding good bipartitions of the vertex set of a graph in the sense of the well-known measure of ratioCut (essentially the ratio between weight of cut edges and the product of weights of the nodesets of the bipartition). The widely accepted ratioCut bipartitioning algorithm of Wei and Cheng is similar in spirit to the Fiduccia-Mattheyeses algorithm (F-M algorithm). Our approach makes use of F-M algorithm as the first phase that takes in as an input, random bipartitions. In the later phase of our algorithm we make use of a new coarsening strategy and follow it up with a submodular function optimization algorithm on the coarsened graph. We also present the comparison of results of this approach applied to benchmark circuits with the well-established algorithms such as the Wei-Cheng algorithm for ratioCut bipartitioning and pmetis of Metis package. The comparative study not only shows that this new approach indeed produces good quality ratioCut bipartitions, but also the fact that this approach has the potential of finding a large number of such good partitions in comparison with other approaches. The key subroutine in our heuristic strategies is based on the recent finding about the role of submodular functions in designing new heuristics and approximate algorithms to some NP-hard problems.

international symposium on algorithms and computation | 1992

Principal Lattice of Partition of submodular functions on Graphs: Fast algorithms for Principal Partition and Generic Rigidity

Sachin B. Patkar; H. Narayanan

In this paper we use a single unifying approach (which we call the Principal Lattice of Partitions approach) to construct simple and fast algorithms for problems including and related to the “Principal Partition” and the “Generic Rigidity” of graphs. Most of our algorithms are at least as fast as presently known algorithms for these problems, while our algorithm for Principal Partition problem (complete partition and the partial orders for all critical values) runs in O(¦E∥V¦2log2¦V¦) time and is the fastest known so far.

Journal of Computational Science | 2011

Solution of Partial Differential Equations by electrical analogy

Yogesh Dilip Save; H. Narayanan; Sachin B. Patkar

Abstract In this paper, PDEs are modeled by an electrical equivalent circuit generated from the equations arising from the finite element method (FEM). This allows the solution of PDEs to be obtained through circuit simulation. Our approach yields the same answer as FEM since the underlying equations are identical. The additional time required for transformation into an electrical network is negligible since it is done element by element and has been shown to be so in our experiments. Our approach naturally permits the simulation of coupled systems, where the electrical/mechanical devices, whose behaviour is governed by PDEs, are connected together through an electrical circuit. Further, the approach also permits a wide variety of electrical techniques such as the hybrid methods using both currents and voltages as unknowns, and parallelization techniques such as multiport decomposition, for the solution of the problem. Simple test problems (electrostatic analysis of p–n junction diode and a heat transfer problem) are analyzed by our circuit simulator to show the validity of the proposed approach. We also show that the approach works well for nonlinear PDEs.

international symposium on electronic system design | 2012

FPGA Implementation of Particle Filter Based Object Tracking in Video

Sumeet Agrawal; Rajbabu Velmurugan; Sachin B. Patkar

There is a continuous requirement of enhancing the computation speed with minimum resources to improve performance of signal processing algorithm. This paper proposes an architecture and implementation of a modified color histogram based Particle filter for object tracking in video. This architecture implements weight calculation and histogram calculation in a highly parallel form. The proposed architecture occupies less resource saving by effective memory utilization. The performance of the algorithm is demonstrated using a single object scenario.

Microprocessors and Microsystems | 2013

A design methodology for optimally folded, pipelined architectures in VLSI applications using projective space lattices

Hrishikesh Sharma; Sachin B. Patkar

Semi-parallel, or folded, VLSI architectures are used whenever hardware resources need to be saved. Most recent applications that are based on Projective Geometry (PG) based balanced bipartite graphs also fall in this category. Many of these applications are actively being researched upon, especially in the area of coding theory and matrix computations. Almost all these applications need bipartite graphs of the order of tens of thousands in practice, whose nodes represent parallel processing. To reduce its implementation cost, reducing amount of hardware resources is an important engineering objective. In this paper, we provide a high-level, top-down design methodology to design optimal semi-parallel architectures for applications, whose Data Flow Graph (DFG) is based on PG bipartite graph. Unlike many other folding schemes, the topology of connections between physical elements nodesdoes not change at runtime in this methodology. Hence the folding scheme achieves the best possible throughput, in lack of any overhead of shuffling data across memories while scheduling another computation on the same processing unit. Another advantage is the ease of implementation. To lessen the throughput loss due to folding, we also incorporate a multi-tier pipelining strategy in the design methodology. A C++-based synthesis tool has been developed and tested for automatic generation of RTL models, and is publicly available. A specific high-performance design of a low-density parity check (LDPC) decoder based on this methodology was worked out in past, and has been patent pending.

foundations of software technology and theoretical computer science | 1991

A fast algorithm for the principal partition of a graph

Sachin B. Patkar; H. Narayanan

We present an O(¦ E ¦2¦ V ¦ log ¦ V ¦) algorithm for the construction of the principal partition of a graph. The best known earlier algorithm for this problem is O(¦ E ¦3log ¦ V ¦). Our approach differs from the earlier approaches in that it is node-partition based rather than edge-set based. We use flow maximisation as our basic subroutine.

Explore More