Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sunayana Sitaram is active.

Publication


Featured researches published by Sunayana Sitaram.


international conference on acoustics, speech, and signal processing | 2013

Bootstrapping Text-to-Speech for speech processing in languages without an orthography

Sunayana Sitaram; Sukhada Palkar; Yun-Nung Chen; Alok Parlikar; Alan W. Black

Speech synthesis technology has reached the stage where given a well-designed corpus of audio and accurate transcription an at least understandable synthesizer can be built without necessarily resorting to new innovations. However many languages do not have a well-defined writing system but such languages could still greatly benefit from speech systems. In this paper we consider the case where we have a (potentially large) single speaker database but have no transcriptions and no standardized way to write transcriptions. To address this scenario we propose a method that allows us to bootstrap synthetic voices purely from speech data. We use a novel combination of automatic speech recognition and automatic word segmentation for the bootstrapping. Our experimental results on speech corpora in two languages, English and German, show that synthetic voices that are built using this method are close to understandable. Our method is language-independent and can thus be used to build synthetic voices from a speech corpus in any new language.


ACM Transactions on Speech and Language Processing | 2011

Two methods for assessing oral reading prosody

Minh Duong; Jack Mostow; Sunayana Sitaram

We compare two types of models to assess the prosody of childrens oral reading. Template models measure how well the childs prosodic contour in reading a given sentence correlates in pitch, intensity, pauses, or word reading times with an adult narration of the same sentence. We evaluate template models directly against a common rubric used to assess fluency by hand, and indirectly by their ability to predict fluency and comprehension test scores and gains of 10 children who used Project LISTENs Reading Tutor; the template models outpredict the human assessment. We also use the same set of adult narrations to train generalized models for mapping text to prosody, and use them to evaluate childrens prosody. Using only durational features for both types of models, the generalized models perform better at predicting fluency and comprehension posttest scores of 55 children ages 7--10, with adjusted R2 of 0.6. Such models could help teachers identify which students are making adequate progress. The generalized models have the additional advantage of not requiring an adult narration of every sentence.


north american chapter of the association for computational linguistics | 2016

Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning

Yulia Tsvetkov; Sunayana Sitaram; Manaal Faruqui; Guillaume Lample; Patrick Littell; David R. Mortensen; Alan W. Black; Lori S. Levin; Chris Dyer

We introduce polyglot language models, recurrent neural network models trained to predict symbol sequences in many different languages using shared representations of symbols and conditioning on typological information about the language to be predicted. We apply these to the problem of modeling phone sequences---a domain in which universal symbol inventories and cross-linguistically shared feature representations are a natural fit. Intrinsic evaluation on held-out perplexity, qualitative analysis of the learned representations, and extrinsic evaluation in two downstream applications that make use of phonetic features show (i) that polyglot models better generalize to held-out data than comparable monolingual models and (ii) that polyglot phonetic feature representations are of higher quality than those learned monolingually.


acm symposium on computing and development | 2013

A Hindi speech recognizer for an agricultural video search application

Kalika Bali; Sunayana Sitaram; Sébastien Cuendet; Indrani Medhi

Voice user interfaces for ICTD applications have immense potential in their ability to reach to a large illiterate or semi-literate population in these regions where text-based interfaces are of little use. However, building speech systems for a new language is a highly resource intensive task. There have been attempts in the past to develop techniques to circumvent the need for large amounts of data and technical expertise required to build such systems. In this paper we present the development and evaluation of an application specific speech recognizer for Hindi. We use the Salaam method [4] to bootstrap a high quality speech engine in English to develop a mobile speech based agricultural video search for farmers in India. With very little training data for a 79 word vocabulary we are able to achieve >90% accuracies for test and field deployments. We report some observations from field that we believe are critical to the effective development and usability of a speech application in ICTD.


international conference on advances in pattern recognition | 2009

DA-IICT Cross-lingual and Multilingual Corpora for Speaker Recognition

Hemant A. Patil; Sunayana Sitaram; Esha Sharma

In this paper the design and development of the DA-IICT Cross-lingual and Multilingual Speech Corpora is presented which includes unconventional sounds like cough, whistle, whisper, frication, idiosyncrasies, etc. from bilingual subjects (i.e., who can speak Hindi and Indian English) and trilingual subjects (who can speak Hindi, Indian English and mother tongue) for the development of Automatic Speaker Recognition System. Thirteen Indian languages and the Nepali language are considered as the subjects’ mother tongue/native languages. Unconventional sounds are considered to examine how much speaker-specific information they carry. Finally, an ASR system based on spectral or cepstral features (i.e., LPC, LPCC, MFCC) and polynomial classifier of 2nd order approximation is presented to evaluate the developed corpora.


9th ISCA Speech Synthesis Workshop | 2016

Open-Source Consumer-Grade Indic Text To Speech.

Andrew Wilkinson; Alok Parlikar; Sunayana Sitaram; Tim White; Alan W. Black; Suresh Bazaj

Open-source text-to-speech (TTS) software has enabled the development of voices in multiple languages, including many high-resource languages, such as English and European languages. However, building voices for low-resource languages is still challenging. We describe the development of TTS systems for 12 Indian languages using the Festvox framework, for which we developed a common frontend for Indian languages. Voices for eight of these 12 languages are available for use with Flite, a lightweight, fast run-time synthesizer, and the Android Flite app available in the Google Play store. Recently, the baseline Punjabi TTS voice was built end-to-end in a month by two undergraduate students (without any prior knowledge of TTS) with help from two of the authors of this paper. The framework can be used to build a baseline Indic TTS voice in two weeks, once a text corpus is selected and a suitable native speaker is identified.


9th ISCA Speech Synthesis Workshop | 2016

Experiments with Cross-lingual Systems for Synthesis of Code-Mixed Text.

Sunayana Sitaram; Sai Krishna Rallabandi; Shruti Rijhwani; Alan W. Black

Most Text to Speech (TTS) systems today assume that the input is in a single language written in its native script, which is the language that the TTS database is recorded in. However, due to the rise in conversational data available from social media, phenomena such as code-mixing, in which multiple languages are used together in the same conversation or sentence are now seen in text. TTS systems capable of synthesizing such text need to be able to handle multiple languages at the same time, and may also need to deal with noisy input. Previously, we proposed a framework to synthesize code-mixed text by using a TTS database in a single language, identifying the language that each word was from, normalizing spellings of a language written in a non-standardized script and mapping the phonetic space of mixed language to the language that the TTS database was recorded in. We extend this cross-lingual approach to more language pairs, and improve upon our language identification technique. We conduct listening tests to determine which of the two languages being mixed should be used as the target language. We perform experiments for code-mixed Hindi-English and German-English and conduct listening tests with bilingual speakers of these languages. From our subjective experiments we find that listeners have a strong preference for cross-lingual systems with Hindi as the target language for code-mixed Hindi and English text. We also find that listeners prefer cross-lingual systems in English that can synthesize German text for codemixed German and English text.


conference of the international speech communication association | 2015

Using articulatory features and inferred phonological segments in zero resource speech processing.

Pallavi Baljekar; Sunayana Sitaram; Prasanna Kumar Muthukumar; Alan W. Black


SSW | 2013

Text to Speech in New Languages without a Standardized Orthography

Sunayana Sitaram; Gopala Krishna Anumanchipalli; Justin Chiu; Alok Parlikar; Alan W. Black


the florida ai research society | 2012

Mining Data from Project LISTEN's Reading Tutor to Analyze Development of Children's Oral Reading Prosody.

Sunayana Sitaram; Jack Mostow

Collaboration


Dive into the Sunayana Sitaram's collaboration.

Top Co-Authors

Avatar

Alan W. Black

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alok Parlikar

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Jack Mostow

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Sai Krishna Rallabandi

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sébastien Cuendet

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar

Anders Weinstein

Carnegie Mellon University

View shared research outputs
Researchain Logo
Decentralizing Knowledge