Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kriste Krstovski is active.

Publication


Featured researches published by Kriste Krstovski.


international conference on acoustics, speech, and signal processing | 2008

Recent improvements and performance analysis of ASR and MT in a speech-to-speech translation system

David Stallard; Chia-Lin Kao; Kriste Krstovski; Daben Liu; Premkumar Natarajan; Rohit Prasad; Shirin Saleem; Krishna Subramanian

We report on recent ASR and MT work on our English/Iraqi Arabic speech-to-speech translation system. We present detailed results for both objective and subjective evaluations of translation quality, along with a detailed analysis and categorization of translation errors. We also present novel ideas for quantifying the relative importance of different subjective error categories, and for assigning the blame for an error to a particular phrase pair in the translation model.


international conference on the theory of information retrieval | 2013

Efficient Nearest-Neighbor Search in the Probability Simplex

Kriste Krstovski; David A. Smith; Hanna M. Wallach; Andrew McGregor

Document similarity tasks arise in many areas of information retrieval and natural language processing. A fundamental question when comparing documents is which representation to use. Topic models, which have served as versatile tools for exploratory data analysis and visualization, represent documents as probability distributions over latent topics. Systems comparing topic distributions thus use measures of probability divergence such as Kullback-Leibler, Jensen-Shannon, or Hellinger. This paper presents novel analysis and applications of the reduction of Hellinger divergence to Euclidean distance computations. This reduction allows us to exploit fast approximate nearest-neighbor (NN) techniques, such as locality-sensitive hashing (LSH) and approximate search in k-d trees, for search in the probability simplex. We demonstrate the effectiveness and efficiency of this approach on two tasks using latent Dirichlet allocation (LDA) document representations: discovering relationships between National Institutes of Health (NIH) grants and prior-art retrieval for patents. Evaluation on these tasks and on synthetic data shows that both Euclidean LSH and approximate k-d tree search perform well when a single nearest neighbor must be found. When a larger set of similar documents is to be retrieved, the k-d tree approach is more effective and efficient.


Computer Speech & Language | 2013

BBN TransTalk: Robust multilingual two-way speech-to-speech translation for mobile platforms

Rohit Prasad; Prem Natarajan; David Stallard; Shirin Saleem; Shankar Ananthakrishnan; Stavros Tsakalidis; Chia-Lin Kao; Fred Choi; Ralf Meermeier; Mark Rawls; Jacob Devlin; Kriste Krstovski; Aaron Challenner

In this paper we present a speech-to-speech (S2S) translation system called the BBN TransTalk that enables two-way communication between speakers of English and speakers who do not understand or speak English. The BBN TransTalk has been configured for several languages including Iraqi Arabic, Pashto, Dari, Farsi, Malay, Indonesian, and Levantine Arabic. We describe the key components of our system: automatic speech recognition (ASR), machine translation (MT), text-to-speech (TTS), dialog manager, and the user interface (UI). In addition, we present novel techniques for overcoming specific challenges in developing high-performing S2S systems. For ASR, we present techniques for dealing with lack of pronunciation and linguistic resources and effective modeling of ambiguity in pronunciations of words in these languages. For MT, we describe techniques for dealing with data sparsity as well as modeling context. We also present and compare different user confirmation techniques for detecting errors that can cause the dialog to drift or stall.


spoken language technology workshop | 2008

Recent improvements in BBN's English/Iraqi speech-to-speech translation system

Fred Choi; Stavros Tsakalidis; Shirin Saleem; Chia-Lin Kao; Ralf Meermeier; Kriste Krstovski; Christine Moran; Krishna Subramanian; David Stallard; Rohit Prasad; Prem Natarajan

We report on recent improvements in our English/Iraqi Arabic speech-to-speech translation system. User interface improvements include a novel parallel approach to user confirmation which makes confirmation cost-free in terms of dialog duration. Automatic speech recognition improvements include the incorporation of state-of-the-art techniques in feature transformation and discriminative training. Machine translation improvements include a novel combination of multiple alignments derived from various pre-processing techniques, such as Arabic segmentation and English word compounding, higher order N-grams for target language model, and use of context in form of semantic classes and part-of-speech tags.


2007 IEEE International Conference on Portable Information Devices | 2007

Real-Time Speech-to-Speech Translation for PDAs

Rohit Prasad; Kriste Krstovski; Fred Choi; Shirin Saleem; Prem Natarajan; Michael Decerbo; David Stallard

In this paper we present a speech-to-speech translation system configured for translingual communication in English and colloquial Iraqi on a mobile, handheld device. The end-to-end system employs a medium/large vocabulary n-gram speech recognition engine for recognizing English and colloquial Iraqi, a question canonicalizer for mapping a recognized English question or command to one of the questions supported in the system, a concept translation engine for translating recognized Iraqi text, and a text-to-speech synthesis engine for playing back the English translation for the Iraqi to the English speaker. In addition to describing the system architecture and the functionality of the components, we present optimization techniques that enable low-latency, real-time speech recognition on low-power hardware platforms.


document analysis systems | 2008

End-to-End Trainable Thai OCR System Using Hidden Markov Models

Kriste Krstovski; Ehry MacRostie; Rohit Prasad; Premkumar Natarajan

In this paper we present an end-to-end trainable optical character recognition (OCR) system for recognizing machine-printed text in Thai documents. The end-to-end OCR system is based on a script-independent methodology using hidden Markov models. Our system provides an integrated workflow beginning with annotation and transcription of training images to performing OCR on new images with models trained on transcribed training images. The efficacy of our end-to-end OCR system is demonstrated by rapidly configuring our OCR engine for the Thai script. We present experimental results on Thai documents to highlight the specific challenges posed by the Thai script and analyze the recognition performance as a function of amount of training data.


north american chapter of the association for computational linguistics | 2016

Online Multilingual Topic Models with Multi-Level Hyperpriors

Kriste Krstovski; David A. Smith; Michael J. Kurtz

For topic models, such as LDA, that use a bag-of-words assumption, it becomes especially important to break the corpus into appropriately-sized “documents”. Since the models are estimated solely from the term cooccurrences, extensive documents such as books or long journal articles lead to diffuse statistics, and short documents such as forum posts or product reviews can lead to sparsity. This paper describes practical inference procedures for hierarchical models that smooth topic estimates for smaller sections with hyperpriors over larger documents. Importantly for large collections, these online variational Bayes inference methods perform a single pass over a corpus and achieve better perplexity than “flat” topic models on monolingual and multilingual data. Furthermore, on the task of detecting document translation pairs in large multilingual collections, polylingual topic models (PLTM) with multi-level hyperpriors (mlhPLTM) achieve significantly better performance than existing online PLTM models while retaining computational efficiency.


international acm sigir conference on research and development in information retrieval | 2012

A framework for manipulating and searching multiple retrieval types

Marc-Allen Cartright; Ethem F. Can; William Dabney; Jeffrey Dalton; Logan Giorda; Kriste Krstovski; Xiaoye Wu; Ismet Zeki Yalniz; James Allan; R. Manmatha; David A. Smith

Conventional retrieval systems view documents as a unit and look at different retrieval types within a document. We introduce Proteus, a frame-work for seamlessly navigating books as dynamic collections which are defined on the fly. Proteus allows us to search various retrieval types. Navigable types include pages, books, named persons, locations, and pictures in a collection of books taken from the Internet Archive. The demonstration shows the value of multi-type browsing in dynamic collections to peruse new data.


Sensor Review | 2006

Consolidated data acquisition and management for first responders

Dragan Vidacic; Pavlo Melnyk; Kriste Krstovski; Richard A. Messner; Frank C. Hludik; Andrew L. Kun

Purpose – To design an efficient and integrated framework for automated and simple data acquisition and processing targeted for first response scenarios.Design/methodology/approach – Utilizes existing software/hardware integration tools and primarily off‐the‐shelf components. Use the modular system architecture for development of new applications. System construction is preceded by the analysis of currently available devices for specific data acquisition and processing.Findings – The development and integration of data acquisition and processing tools for first responder scenarios can be rapidly achieved by the modular and already existing software/hardware integration platform. Data types processed by this system are biometrics, live video/audio and textual/command data. The data acquisition is followed by the prompt dissemination of information from the incident scene thus overcoming interoperability issues.Practical implications – Integration of new modules is achieved through simple system upgrades – ...


international acm sigir conference on research and development in information retrieval | 2015

Evaluating Retrieval Models through Histogram Analysis

Kriste Krstovski; David A. Smith; Michael J. Kurtz

We present a novel approach for efficiently evaluating the performance of retrieval models and introduce two evaluation metrics: Distributional Overlap (DO), which compares the clustering of scores of relevant and non-relevant documents, and Histogram Slope Analysis (HSA), which examines the log of the empirical distributions of relevant and non-relevant documents. Unlike rank evaluation metrics such as mean average precision (MAP) and normalized discounted cumulative gain (NDCG), DO and HSA only require calculating model scores of queries and a fixed sample of relevant and non-relevant documents rather than scoring the entire collection, even implicitly by means of an inverted index. In experimental meta-evaluations, we find that HSA achieves high correlation with MAP and NDCG on a monolingual and a cross-language document similarity task; on four ad-hoc web retrieval tasks; and on an analysis of ten TREC tasks from the past ten years. In addition, when evaluating latent Dirichlet allocation (LDA) models on document similarity tasks, HSA achieves better correlation with MAP and NCDG than perplexity, an intrinsic metric widely used with topic models.

Collaboration


Dive into the Kriste Krstovski's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David A. Smith

University of Massachusetts Amherst

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Prem Natarajan

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Michael J. Kurtz

Smithsonian Astrophysical Observatory

View shared research outputs
Researchain Logo
Decentralizing Knowledge