Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Vadim Sheinin is active.

Publication


Featured researches published by Vadim Sheinin.


international symposium on biomedical imaging | 2007

REAL-TIME MUTUAL-INFORMATION-BASED LINEAR REGISTRATION ON THE CELL BROADBAND ENGINE PROCESSOR

Moriyoshi Ohara; Hangu Yeo; F. Savino; Giridharan Iyengar; Leiguang Gong; Hiroshi Inoue; Hideaki Komatsu; Vadim Sheinin; S. Daijavaa; Bradley J. Erickson

Emerging multi-core processors are able to accelerate medical imaging applications by exploiting the parallelism available in their algorithms. We have implemented a mutual-information-based 3D linear registration algorithm on the Cell Broadband Enginetrade (CBE) processor, which has nine processor cores on a chip and has a 4-way SIMD unit for each core. By exploiting the highly parallel architecture and its high memory bandwidth, our implementation with two CBE processors can compute mutual information for about 33 million pixel pairs in a second. This implementation is significantly faster than a conventional one on a traditional microprocessor or even faster than a previously reported custom-hardware implementation. As a result, it can register a pair of 256times256times30 3D images in one second by using a multi-resolution method. This paper describes our implementation with a focus on localized sampling and speculative packing techniques, which reduce the amount of the memory traffic by 82%


visual communications and image processing | 2008

Wyner-Ziv video compression using rateless LDPC codes

Dake He; Ashish Jagmohan; Ligang Lu; Vadim Sheinin

In this paper we consider Wyner-Ziv video compression using rateless LDPC codes. It is shown that the advantages of using rateless LDPC codes in Wyner-Ziv video compression, in comparison to using traditional fixed-rate LDPC codes, are at least threefold: 1) it significantly reduces the storage complexity; 2) it allows seamless integration with mode selection; and 3) it greatly improves the overall systems performance. Experimental results on the standard CIF-sized sequence mobile_and_calendar show that by combining rateless LDPC coding with simple skip mode selection, one can build a Wyner-Ziv video compression system that is, at rate 0.2 bits per pixel, about 2.25dB away from the standard JM software implementation of the H.264 main profile, more than 8.5dB better than H.264 Intra where all frames are H.264 coded intrapredicted frames, and about 2.3dB better than the same Wyner-Ziv system using fixed-rate LDPC coding. In terms of encoding complexity, the Wyner-Ziv video compression system is two orders of magnitude less complex than the JM implementation of the H.264 main profile.


international conference on multimedia and expo | 2006

Video Analysis and Compression on the STI Cell Broadband Engine Processor

Lurng-Kuo Liu; Sreeni Kesavarapu; Jonathan H. Connell; Ashish Jagmohan; Lark-hoon Leem; Brent Paulovicks; Vadim Sheinin; Lijung Tang; Hangu Yeo

With increased concern for physical security, video surveillance is becoming an important business area. Similar camera-based system can also be used in such diverse applications as retail-store shopper motion analysis and casino behavioral policy monitoring. There are two aspects of video surveillance that require significant computing power: image analysis for detecting objects, and video compression for digital storage. The new STI CELL broadband engine (CBE) processor is an appealing platform for such applications because it incorporates 8 separate high-speed processing cores with an aggregate performance of 256Gflops. Moreover, this chip is the heart of the new Sony Playstation 3 and can be expected to be relatively inexpensive due to the high volume of production. In this paper we show how object detection and compression can be implemented on the CBE, discuss the difficulties encountered in porting the code, and provide performance results demonstrating significant speed-up


international conference on acoustics, speech, and signal processing | 2006

Uniform Threshold Scalar Quantizer Performance in Wyner-Ziv Coding With Memoryless, Additive Laplacian Correlation Channel

Vadim Sheinin; Ashish Jagmohan; Dake He

The performance of a uniform-threshold scalar quantizer in Wyner-Ziv coding is investigated in this paper. To derive analytical expressions we assume the abstract correlation channel from the side information to the source to be encoded is memoryless, additive Laplacian. Furthermore, in order to focus our attention on the performance of the quantizer, the Wyner-Ziv coding scheme is assumed to encode the quantizer output by using perfect Slepian-Wolf coding. Analytical expressions for the operational rate-distortion function are obtained for this case. By evaluating these analytical expressions, we show that scalar quantization with a mid-tread uniform threshold quantizer, followed by perfect Slepian Wolf coding achieves performance which is close to the theoretical Wyner-Ziv rate-distortion bound at low rates


Ibm Journal of Research and Development | 2010

OpenCL and parallel primitives for digital TV applications

Seung Mo Cho; Dong-Woo Im; Oh-Young Jang; Hyo Jung Song; Brent Paulovicks; Vadim Sheinin; Hangu Yeo

Open Computing Language®(OpenCL®), which is created to support H. Yeo parallel programming of heterogeneous multicore-processor systems, has a very large potential for high-performance computing and consumer electronics since it provides application programming interfaces (APIs) to help make a portable code that runs across multiple devices. OpenCL is still under development, and it is not clear whether OpenCL has any advantages over other frameworks aside from portability. The purpose of our project was to define evaluation criteria, empirically evaluate OpenCL as a programming framework using evaluation criteria (e.g., performance, productivity, and portability criteria), define and implement parallel primitives in OpenCL, and demonstrate how the use of the implemented parallel primitives can have benefits for our target applications. Parallel primitive library APIs are defined to implement parallel algorithms in OpenCL, and a set of data- and task-parallel primitives is implemented and incorporated in the target applications. Multicore central processing units, the Cell Broadband Engine®(Cell/B.E.®), and graphics processing units are used as target platforms, and digital TV applications are used to evaluate usefulness of OpenCL. Preliminary results show that parallel primitives can be one of the ways to improve application performance and programmer productivity with respect to OpenCL while still maintaining software portability.


international conference on multimedia and expo | 2007

Motion Estimation with Similarity Constraint and its Application to Distributed Video Coding

Ligang Lu; Vadim Sheinin

In this paper we present a new motion estimation scheme that minimizes the objective distance function with a constraint of similarity measure to exploit the motion correlation among adjacent pixel blocks with similar statistics features. We formulate this correlation as a similarity measure on the motion vectors between the current pixel block and its neighboring blocks weighted by the corresponding statistical similarity. We then use this similarity measure as a constraint in the objective distance function to reduce the noise effects and improve performance by effectively trading off the difference in the pixel values with the smoothness in the adjacent motion vectors. Thus, in motion estimation, our new scheme not only minimizes the pixel differences but tries to preserve the motion smoothness among statistically similar neighbors. We applied this new motion estimation scheme to a distributed video coding system for side information generation for Wyner-Ziv decoding and compared its performance to the scheme without similarity constraint. The results have shown that our motion estimation scheme can achieve significant gains in the fidelity of the side information and the decoded Wyner-Ziv frames over the scheme without the similarity constraint.


Proceedings of SPIE | 2011

Accelerating statistical image reconstruction algorithms for fan-beam x-ray CT using cloud computing

Somesh Srivastava; A. Ravishankar Rao; Vadim Sheinin

Statistical image reconstruction algorithms potentially offer many advantages to x-ray computed tomography (CT), e.g. lower radiation dose. But, their adoption in practical CT scanners requires extra computation power, which is traditionally provided by incorporating additional computing hardware (e.g. CPU-clusters, GPUs, FPGAs etc.) into a scanner. An alternative solution is to access the required computation power over the internet from a cloud computing service, which is orders-of-magnitude more cost-effective. This is because users only pay a small pay-as-you-go fee for the computation resources used (i.e. CPU time, storage etc.), and completely avoid purchase, maintenance and upgrade costs. In this paper, we investigate the benefits and shortcomings of using cloud computing for statistical image reconstruction. We parallelized the most time-consuming parts of our application, the forward and back projectors, using MapReduce, the standard parallelization library on clouds. From preliminary investigations, we found that a large speedup is possible at a very low cost. But, communication overheads inside MapReduce can limit the maximum speedup, and a better MapReduce implementation might become necessary in the future. All the experiments for this paper, including development and testing, were completed on the Amazon Elastic Compute Cloud (EC2) for less than


international conference on multimedia and expo | 2007

Accelerating Mutual-Information-Based Linear Registration on the Cell Broadband Engine Processor

Moriyoshi Ohara; Hangu Yeo; Frank Savino; Giridharan Iyengar; Leiguang Gong; Hiroshi Inoue; Hideaki Komatsu; Vadim Sheinin; Shahrokh Daijavad

20.


visual communications and image processing | 2004

Video coding for decoding power-constrained embedded devices

Ligang Lu; Vadim Sheinin

Emerging multi-core processors are able to accelerate medical imaging applications by exploiting the parallelism available in their algorithms. We have implemented a mutual-information-based 3D linear registration algorithm on the Cell Broadband Enginetrade processor. By exploiting the highly parallel architecture and its high memory bandwidth, our implementation with two CBE processors can register a pair of 256x256x30 3D images in one second. This implementation is significantly faster than a conventional one on a traditional microprocessor or even faster than a previously reported custom-hardware implementation. In addition to parallelizing the code for multiple cores and organizing the data structure for reducing the amount of the memory traffic, it is also critical to optimize the code for the SIMD pipeline structure. We note that code optimization for the SIMD pipeline alone results in a 4.2x-8.7x acceleration for the computation of small kernels. Further, SIMD optimization alone results in a 4.5x end-end application speedup.


international conference on algorithms and architectures for parallel processing | 2011

Parallel implementation of external sort and join operations on a multi-core network-optimized system on a chip

Elahe Khorasani; Brent Paulovicks; Vadim Sheinin; Hangu Yeo

Low power dissipation and fast processing time are crucial requirements for embedded multimedia devices. This paper presents a technique in video coding to decrease the power consumption at a standard video decoder. Coupled with a small dedicated video internal memory cache on a decoder, the technique can substantially decrease the amount of data traffic to the external memory at the decoder. A decrease in data traffic to the external memory at decoder will result in multiple benefits: faster real-time processing and power savings. The encoder, given prior knowledge of the decoder’s dedicated video internal memory cache management scheme, regulates its choice of motion compensated predictors to reduce the decoder’s external memory accesses. This technique can be used in any standard or proprietary encoder scheme to generate a compliant output bit stream decodable by standard CPU-based and dedicated hardware-based decoders for power savings with the best quality-power cost trade off. Our simulation results show that with a relatively small amount of dedicated video internal memory cache, the technique may decrease the traffic between CPU and external memory over 50%.

Researchain Logo
Decentralizing Knowledge