Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ekaterina Gonina is active.

Publication


Featured researches published by Ekaterina Gonina.


IEEE Signal Processing Magazine | 2009

Parallel scalability in speech recognition

Kisun You; Jike Chong; Youngmin Yi; Ekaterina Gonina; Christopher J. Hughes; Yen-Kuang Chen; Wonyong Sung; Kurt Keutzer

We propose four application-level implementation alternatives called algorithm styles and construct highly optimized implementations on two parallel platforms: an Intel Core i7 multicore processor and a NVIDIA GTX280 manycore processor. The highest performing algorithm style varies with the implementation platform. On a 44-min speech data set, we demonstrate substantial speedups of 3.4 X on Core i7 and 10.5 X on GTX280 compared to a highly optimized sequential implementation on Core i7 without sacrificing accuracy. The parallel implementations contain less than 2.5% sequential overhead, promising scalability and significant potential for further speedup on future platforms.


international conference on multimedia and expo | 2009

Scalable HMM based inference engine in large vocabulary continuous speech recognition

Jike Chong; Kisun You; Youngmin Yi; Ekaterina Gonina; Christopher J. Hughes; Wonyong Sung; Kurt Keutzer

Parallel scalability allows an application to efficiently utilize an increasing number of processing elements. In this paper we explore a design space for application scalability for an inference engine in large vocabulary continuous speech recognition (LVCSR). Our implementation of the inference engine involves a parallel graph traversal through an irregular graph-based knowledge network with millions of states and arcs. The challenge is not only to define a software architecture that exposes sufficient fine-grained application concurrency, but also to efficiently synchronize between an increasing number of concurrent tasks and to effectively utilize the parallelism opportunities in todays highly parallel processors. We propose four application-level implementation alternatives we call “algorithm styles”, and construct highly optimized implementations on two parallel platforms: an Intel Core i7 multicore processor and a NVIDIA GTX280 manycore processor. The highest performing algorithm style varies with the implementation platform. On 44 minutes of speech data set, we demonstrate substantial speedups of 3.4× on Core i7 and 10.5× on GTX280 compared to a highly optimized sequential implementation on Core i7 without sacrificing accuracy. The parallel implementations contain less than 2.5% sequential overhead, promising scalability and significant potential for further speedup on future platforms.


GPU Computing Gems Emerald Edition | 2011

Efficient Automatic Speech Recognition on the GPU

Jike Chong; Ekaterina Gonina; Kurt Keutzer

Publisher Summary This chapter provides an understanding of specific implementation challenges when working with the speech inference process, weighted finite state transducer (WFST) based methods, and the Viterbi algorithm. It illustrates an efficient reference implementation on the GPU that could be productively customized to meet the needs of specific usage scenarios. Automatic speech recognition (ASR) allows multimedia content to be transcribed from acoustic waveforms to word sequences. This technology is emerging as a critical component in data analytics for a wealth of media data that is being generated every day. ASR is a challenging application to parallelize. Specifically, on the GPU an efficient implementation of ASR involves resolving a series of implementation challenges specific to the data-parallel architecture of the platform. There are efficient solutions for resolving the implementation challenges of speech recognition on the GPU that achieve more than an order of magnitude speedup compared to sequential execution. This chapter identifies and resolves four types of algorithmic challenges encountered during in the implementation of speech recognition on GPUs. The techniques presented here, when used together, are capable of delivering 10.6X speedup for this challenging application when compared to an optimized sequential implementation on the CPU. This kind of application framework provides an optimized infrastructure that incorporates all the techniques discussed in this chapter to allow efficient execution of the speech inference process on the GPU.


ieee automatic speech recognition and understanding workshop | 2011

Fast speaker diarization using a high-level scripting language

Ekaterina Gonina; Gerald Friedland; Henry Cook; Kurt Keutzer

Most current speaker diarization systems use agglomerative clustering of Gaussian Mixture Models (GMMs) to determine “who spoke when” in an audio recording. While state-of-the-art in accuracy, this method is computationally costly, mostly due to the GMM training, and thus limits the performance of current approaches to be roughly real-time. Increased sizes of current datasets require processing of hundreds of hours of data and thus make more efficient processing methods highly desirable. With the emergence of highly parallel multicore and manycore processors, such as graphics processing units (GPUs), one can re-implement GMM training to achieve faster than real-time performance by taking advantage of parallelism in the training computation. However, developing and maintaining the complex low-level GPU code is difficult and requires a deep understanding of the hardware architecture of the parallel processor. Furthermore, such low-level implementations are not readily reusable in other applications and not portable to other platforms, limiting programmer productivity. In this paper we present a speaker diarization system captured in under 50 lines of Python that achieves 50–250× faster than real-time performance by using a specialization framework to automatically map and execute computationally intensive GMM training on an NVIDIA GPU, without significant loss in accuracy.


ACM Transactions on Multimedia Computing, Communications, and Applications | 2014

Scalable multimedia content analysis on parallel platforms using python

Ekaterina Gonina; Gerald Friedland; Eric Battenberg; Penporn Koanantakool; Michael B. Driscoll; Evangelos Georganas; Kurt Keutzer

In this new era dominated by consumer-produced media there is a high demand for web-scalable solutions to multimedia content analysis. A compelling approach to making applications scalable is to explicitly map their computation onto parallel platforms. However, developing efficient parallel implementations and fully utilizing the available resources remains a challenge due to the increased code complexity, limited portability and required low-level knowledge of the underlying hardware. In this article, we present PyCASP, a Python-based framework that automatically maps computation onto parallel platforms from Python application code to a variety of parallel platforms. PyCASP is designed using a systematic, pattern-oriented approach to offer a single software development environment for multimedia content analysis applications. Using PyCASP, applications can be prototyped in a couple hundred lines of Python code and automatically scale to modern parallel processors. Applications written with PyCASP are portable to a variety of parallel platforms and efficiently scale from a single desktop Graphics Processing Unit (GPU) to an entire cluster with a small change to application code. To illustrate our approach, we present three multimedia content analysis applications that use our framework: a state-of-the-art speaker diarization application, a content-based music recommendation system based on the Million Song Dataset, and a video event detection system for consumer-produced videos. We show that across this wide range of applications, our approach achieves the goal of automatic portability and scalability while at the same time allowing easy prototyping in a high-level language and efficient performance of low-level optimized code.


Proceedings of the second international workshop on MapReduce and its applications | 2011

Parallelizing large-scale data processing applications with data skew: a case study in product-offer matching

Ekaterina Gonina; Anitha Kannan; John C. Shafer; Mihai Budiu

The last decade has seen a surge of interest in large-scale data-parallel processing engines. While these engines share many features in common with parallel databases, they make a set of different trade-offs. In consequence many of the lessons learned for programming parallel databases have to be re-learned in the new environment. In this paper we show a case study of parallelizing an example large-scale application (offer matching, a core part of online shopping) on an example MapReduce-based distributed computation engine (DryadLINQ). We focus on the challenges raised by the nature of large data sets and data skew and show how they can be addressed effectively within this computation framework by optimizing the computation to adapt to the nature of the data. In particular we describe three different strategies for performing distributed joins and show how the platform language allows us to implement optimization strategies at the application level, without system support. We show that this flexibility in the programming model allows for a highly effective system, providing a measured speedup of more than 100 on 64 machines (256 cores), and an estimated speedup of 200 on 1280 machines (5120 cores)of matching 4 million offers.


Multiprocessor System-on-Chip | 2011

PALLAS: Mapping Applications onto Manycore

Michael J. Anderson; Bryan Catanzaro; Jike Chong; Ekaterina Gonina; Kurt Keutzer; Chao-Yue Lai; Mark Murphy; Bor-Yiing Su; Narayanan Sundaram

Parallel programming using the current state-of-the-art in software engineering techniques is hard. Expertise in parallel programming is necessary to deliver good performance in applications; however, it is very common that domain experts lack the requisite expertise in parallel programming. In order to drive the computer science research toward effectively using the available parallel hardware platforms, it is very important to make parallel programming systematical and productive. We believe that the key to designing parallel programs in a systematical way is software architecture, and the key to improve the productivity of developing parallel programs is software frameworks. The basis of both is design patterns and a pattern language.


Proceedings of the 2010 Workshop on Parallel Programming Patterns | 2010

Monte Carlo methods: a computational pattern for our pattern language

Jike Chong; Ekaterina Gonina; Kurt Keutzer

The Monte Carlo methods are an important set of algorithms in computer science. They involve estimating results by statistically sampling a parameter space with a thousands to millions of experiments. The algorithm requires a small set of parameters as input, with which it generates a large amount of computation, and outputs a concise set of aggregated results. The large amount of computation has many independent component with obvious boundaries for parallelization. While the algorithm is well-suited for executing on a highly parallel computing platform, there still exist many challenges such as: selecting a suitable random number generator with the appropriate statistical and computational properties, selecting a suitable distribution conversion method that preserves the statistical properties of the random sequences, leveraging the right abstraction for the computation in the experiments, and designing the an efficient data structures for a particular data working set. This paper presents the Monte Carlo Methods software programming pattern and focuses on the numerical, task, and data perspectives to guide software developers in constructing efficient implementations of applications based on Monte Carlo methods.


conference of the international speech communication association | 2009

A fully data parallel WFST-based large vocabulary continuous speech recognition on a graphics processing unit.

Jike Chong; Ekaterina Gonina; Youngmin Yi; Kurt Keutzer


Archive | 2010

Method and system for parallel statistical inference on highly parallel platforms

Jike Chong; Youngmin Yi; Ekaterina Gonina

Collaboration


Dive into the Ekaterina Gonina's collaboration.

Top Co-Authors

Avatar

Kurt Keutzer

University of California

View shared research outputs
Top Co-Authors

Avatar

Jike Chong

University of California

View shared research outputs
Top Co-Authors

Avatar

Kisun You

Seoul National University

View shared research outputs
Top Co-Authors

Avatar

Youngmin Yi

Seoul National University

View shared research outputs
Top Co-Authors

Avatar

Gerald Friedland

International Computer Science Institute

View shared research outputs
Top Co-Authors

Avatar

Henry Cook

University of California

View shared research outputs
Top Co-Authors

Avatar

Wonyong Sung

Seoul National University

View shared research outputs
Top Co-Authors

Avatar

Armando Fox

University of California

View shared research outputs
Top Co-Authors

Avatar

Bor-Yiing Su

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge