Kaushik Ravindran
University of California, Berkeley
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kaushik Ravindran.
international conference on multimedia and expo | 2007
Jike Chong; Nadathur Satish; Bryan Catanzaro; Kaushik Ravindran; Kurt Keutzer
The H.264 decoder has a sequential, control intensive front end that makes it difficult to leverage the potential performance of emerging manycore processors. Preparsing is a functional parallelization technique to resolve this front end bottleneck. However, the resulting parallel macro block (MB) rendering tasks have highly input-dependent execution times and precedence constraints, which make them difficult to schedule efficiently on manycore processors. To address these issues, we propose a two step approach: (i) a custom preparsing technique to resolve control dependencies in the input stream and expose MB level data parallelism, (ii) an MB level scheduling technique to allocate and load balance MB rendering tasks. The run time MB level scheduling increases the efficiency of parallel execution in the rest of the H.264 decoder, providing 60% speedup over greedy dynamic scheduling and 9-15% speedup over static compile time scheduling for more than four processors. The preparsing technique coupled with run time MB level scheduling enables a potential 7times speedup for H.264 decoding.
international conference on computer aided design | 2003
Kaushik Ravindran; Andreas Kuehlmann; Ellen M. Sentovich
The application of general clock skew scheduling is practicallylimited due to the difficulties in implementing a wide spectrum ofdedicated clock delays in a reliable manner. This results in a significantlimitation of the optimization potential. As an alternative,the application of multiple clocking domains with dedicatedphase shifts that are implemented by reliable, possibly expensivedesign structures can overcome these limitations and substantiallyincrease the implementable optimization potential of clock adjustments.In this paper we present an algorithm for constrained clockskew scheduling which computes for a given number of clockingdomains the optimal phase shifts for the domains and the assignmentof the individual registers to the domains. For the within-domainlatency values, the algorithm can assume a zero-skew clockdelivery or apply a user-provided upper bound. Our experimentsdemonstrate that a constrained clock skew schedule using a fewclocking domains combined with small within-domain latency canreliably implement the full sequential optimization potential to dateonly possible with an unconstrained clock schedule.
field-programmable logic and applications | 2005
Kaushik Ravindran; Nadathur Satish; Yujia Jin; Kurt Keutzer
To realize high performance, embedded applications are deployed on multiprocessor platforms tailored for an application domain. However, when a suitable platform is not available, only few application niches can justify the increasing costs of an IC product design. An alternative is to design the multiprocessor on an FPGA. This retains the programmability advantage, while obviating the risks in producing silicon. This also opens FPGAs to the world of software designers. In this paper, we demonstrate the feasibility of FPGA-based multiprocessors for high performance applications. We deploy IPv4 packet forwarding on a multiprocessor on the Xilinx Virtex-II Pro FPGA. The design achieves a 1.8 Gbps throughput and loses only 2.6X in performance (normalized to area) compared to an implementation on the Intel IXP-28OO network processor. We also develop a design space exploration framework using integer linear programming to explore multiprocessor configurations for an application. Using this framework, we achieve a more efficient multiprocessor design surpassing the performance of our hand-tuned solution for packet forwarding.
IEEE Micro | 2004
Niraj Shah; William Plishker; Kaushik Ravindran; Kurt Keutzer
Application-specific integrated circuit (ASIC) design is too risky and prohibitively expensive for many applications. This trend, combined with increasing silicon capability on a die, is fueling the emergence of application-specific programmable architectures. This focus on architecture design for network processors has made programming them an arduous task. Current network processors require in-depth knowledge of the architecture just to begin programming the device. However, for network processors to succeed, programmers must efficiently implement high-performance applications on them. Writing high-performance code for modern network processors is difficult because of their complexity. NP-Click is a simple programming model that permits programmers to reap the benefits of a domain specific language while still allowing for target-specific optimizations. Results for the Intel IXP1200 indicate that NP-Click delivers a large productivity gain at a slight performance expense.
design, automation, and test in europe | 2007
Nadathur Satish; Kaushik Ravindran; Kurt Keutzer
The paper presents a decomposition strategy to speed up constraint optimization for a representative multiprocessor scheduling problem. In the manner of Benders decomposition, our technique solves relaxed versions of the problem and iteratively learns constraints to prune the solution space. Typical formulations suffer prohibitive run times even on medium-sized problems with less than 30 tasks. Our decomposition strategy enhances constraint optimization to robustly handle instances with over 100 tasks. Moreover, the extensibility of constraint formulations permits realistic application and resource constraints, which is a limitation of common heuristic methods for scheduling. The inherent extensibility, coupled with improved run times from a decomposition strategy, posit constraint optimization as a powerful tool for resource constrained scheduling and multiprocessor design space exploration
embedded software | 2008
Nadathur Satish; Kaushik Ravindran; Kurt Keutzer
We present a statistical optimization approach for scheduling a task dependence graph with variable task execution times onto a heterogeneous multiprocessor system. Scheduling methods in the presence of variations typically rely on worst-case timing estimates for hard real-time applications, or average-case analysis for other applications. However, a large class of soft real-time applications require only statistical guarantees on latency and throughput. We present a general statistical model that captures the probability distributions of task execution times as well as the correlations of execution times of different tasks. We use a Monte Carlo based technique to perform makespan analysis of different schedules based on this model. This approach can be used to analyze the variability present in a variety of soft real-time applications, including a H.264 video processing application. We present two scheduling algorithms based on statistical makespan analysis. The first is a heuristic based on a critical path analysis of the task dependence graph. The other is a simulated annealing algorithm using incremental timing analysis. Both algorithms take as input the required statistical guarantee, and can thus be easily re-used for different required guarantees. We show that optimization methods based on statistical analysis show a 25-30% improvement in makespan over methods based on static worst-case analysis.
field programmable gate arrays | 2005
Yujia Jin; William Plishker; Kaushik Ravindran; Nadathur Satish; Kurt Keutzer
Modern network applications require devices that provide high-performance at gigabit line rates with the flexibility to support diverse application standards and services. However, prohibitive product design costs and shrinking market windows restrict the number of ASIC/ASSP design starts to only selective application niches. The FPGA is an alternate cost-effective medium for many applications. However, creating a system solution through programming an FPGA with an HDL is only attractive to few application developers. Designers would prefer a software solution if it can meet their design constraints. An alterative is an FPGA-based soft multiprocessor system: a programmable multiprocessor on the FPGA composed of a network of hard and soft processing cores. Soft multiprocessors provide a software-level abstraction for programming an FPGA. In this study, we evaluate the feasibility and effectiveness of soft multiprocessor systems. We compare a soft multiprocessor implementation against FPGA hardware and an ASSP implementation for two network applications: IPv4 packet forwarding and Network Address Translation (NAT). Our study indicates that soft multiprocessors lose only a factor of 2X in performance compared to a custom ASSP, while providing great savings in terms of silicon development costs. Moreover, the presence of architectures and development environments for soft multiprocessor systems could open the FPGA market to the much larger world of embedded system software designers.
Archive | 2004
William Plishker; Kaushik Ravindran; Niraj Shah; Kurt Keutzer
Archive | 2008
Kurt Keutzer; Kaushik Ravindran
Archive | 2005
Niraj Shah; William Plishker; Kaushik Ravindran; Matthias Gries; Scott J. Weber; Andrew Mihal; Chidamber Kulkarni; Matthew Moskewicz; Christian Sauer; Kurt Keutzer