Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Adriana Stan is active.

Publication


Featured researches published by Adriana Stan.


spoken language technology workshop | 2012

A grapheme-based method for automatic alignment of speech and text data

Adriana Stan; Peter Bell; Simon King

This paper introduces a method for automatic alignment of speech data with unsynchronised, imperfect transcripts, for a domain where no initial acoustic models are available. Using grapheme-based acoustic models, word skip networks and orthographic speech transcripts, we are able to harvest 55% of the speech with a 93% utterance-level accuracy and 99% word accuracy for the produced transcriptions. The work is based on the assumption that there is a high degree of correspondence between the speech and text, and that a full transcription of all of the speech is not required. The method is language independent and the only prior knowledge and resources required are the speech and text transcripts, and a few minor user interventions.


international conference on acoustics, speech, and signal processing | 2014

NEURAL NET WORD REPRESENTATIONS FOR PHRASE-BREAK PREDICTION WITHOUT A PART OF SPEECH TAGGER

Oliver Watts; Siva Reddy Gangireddy; Junichi Yamagishi; Simon King; Steve Renals; Adriana Stan; Mircea Giurgiu

The use of shared projection neural nets of the sort used in language modelling is proposed as a way of sharing parameters between multiple text-to-speech system components. We experiment with pretraining the weights of such a shared projection on an auxiliary language modelling task and then apply the resulting word representations to the task of phrase-break prediction. Doing so allows us to build phrase-break predictors that rival conventional systems without any reliance on conventional knowledge-based resources such as part of speech taggers.


international conference on acoustics, speech, and signal processing | 2013

Lightly supervised GMM VAD to use audiobook for speech synthesiser

Yoshitaka Mamiya; Junichi Yamagishi; Oliver Watts; Robert A. J. Clark; Simon King; Adriana Stan

Audiobooks have been focused on as promising data for training Text-to-Speech (TTS) systems. However, they usually do not have a correspondence between audio and text data. Moreover, they are usually divided only into chapter units. In practice, we have to make a correspondence of audio and text data before we use them for building TTS synthesisers. However aligning audio and text data is time-consuming and involves manual labor. It also requires persons skilled in speech processing. Previously, we have proposed to use graphemes for automatically aligning speech and text data. This paper further integrates a lightly supervised voice activity detection (VAD) technique to detect sentence boundaries as a pre-processing step before the grapheme approach. This lightly supervised technique requires time stamps of speech and silence only for the first fifty sentences. Combining those, we can semi-automatically build TTS systems from audiobooks with minimum manual intervention. From subjective evaluations we analyse how the grapheme-based aligner and/or the proposed VAD technique impact the quality of HMM-based speech synthesisers trained on audiobooks.


Computer Speech & Language | 2016

ALISA: An automatic lightly supervised speech segmentation and alignment tool

Adriana Stan; Yoshitaka Mamiya; Junichi Yamagishi; Peter Bell; Oliver Watts; Robert A. J. Clark; Simon King

Abstract This paper describes the ALISA tool, which implements a lightly supervised method for sentence-level alignment of speech with imperfect transcripts. Its intended use is to enable the creation of new speech corpora from a multitude of resources in a language-independent fashion, thus avoiding the need to record or transcribe speech data. The method is designed so that it requires minimum user intervention and expert knowledge, and it is able to align data in languages which employ alphabetic scripts. It comprises a GMM-based voice activity detector and a highly constrained grapheme-based speech aligner. The method is evaluated objectively against a gold standard segmentation and transcription, as well as subjectively through building and testing speech synthesis systems from the retrieved data. Results show that on average, 70% of the original data is correctly aligned, with a word error rate of less than 0.5%. In one case, subjective listening tests show a statistically significant preference for voices built on the gold transcript, but this is small and in other tests, no statistically significant differences between the systems built from the fully supervised training data and the one which uses the proposed method are found.


international symposium on electronics and telecommunications | 2010

Romanian language statistics and resources for text-to-speech systems

Adriana Stan; Mircea Giurgiu

This paper introduces a series of results and experiments used in the development of a Romanian text-to-speech system, focusing on text statistics. We investigate the presence of several linguistic units used in text-to-speech systems, from phonemes to words. The text corpus we used, News-Romanian (News-RO) comprises 4500 newspaper articles. A subset of it, around 2500 sentences represents the Romanian Speech Synthesis (RSS) recorded speech database. The results offer an important insight to how should a speech database be designed. We also describe the methods used in the development of a 50,000 words Romanian lexicon with phonetic transcription and accent positioning. Such a lexicon is useful in machine learning algorithms of the front-end part of a text-to-speech system. As an addition we study the use of Maximal Onset Principle for Romanian syllabification.


2011 6th Conference on Speech Technology and Human-Computer Dialogue (SpeD) | 2011

A superpositional model applied to F0 parameterization using DCT for text-to-speech synthesis

Adriana Stan; Mircea Giurgiu

This paper addresses the idea of the superpositional model based on the DCT (Discrete Cosine Transform) parameterization of the F0 contours. We examine the capacity of the DCT coefficients to estimate the fast variations in the F0 contour at syllable level and also the overall trend of the phrase level. The method determines the coefficients at syllable level, based on the subtraction of the estimated phrase level contour from the original one; thus considering that the syllable has an additive prosodic effect over the phrase level. We also compare the use of 3 different decision and regression tree algorithms for DCT coefficients clustering and prediction. Additional features are selected based on a greedy stepwise without backtracking feature selection method. The results support the proposed method through low average square errors and little or no perceivable errors in the synthesized speech.


e health and bioengineering conference | 2015

Voice-related quality of life results in laryngectomies with today's speech options and expectations from the next generation of vocal assistive technologies

Cristina Tiple; Silviu Matu; Florina Veronica Dinescu; Rodica Mureşan; Radu Soflau; Tudor Drugan; Mircea Giurgiu; Adriana Stan; Daniel David; Magdalena Chirila

Objectives: To assess the voice handicap, the satisfaction with todays voice assisting methods and to identify the needs that should be addressed by new vocal assistive technologies for aphonic patients. Materials and Methods: We conducted a prospective study on two samples of patients with total laryngectomy and submitted to speech therapy. Voice Handicap Index (VHI) questionnaires and qualitative (focus-groups) and quantitative (online surveys) methods were used. Results: Analysis of the VHI total revealed that the esophageal and electrolarynx speakers had a moderate voice handicap, while tracheoesophageal speakers and patients without vocal rehabilitation had a severe handicap. Interview and survey data indicated that these patients have many needs which are unmet by available rehabilitation methods. Conclusions: These results point out the necessity to improve current vocal assistive methods and to develop better technologies that could increase the quality of life of this patients.


2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) | 2015

Phonetic segmentation of speech using STEP and t-SNE

Adriana Stan; Cassia Valentini-Botinhao; Mircea Giurgiu; Simon King

This paper introduces a first attempt to perform phoneme-level segmentation of speech based on a perceptual representation - the Spectro Temporal Excitation Pattern (STEP) - and a dimensionality reduction technique - the t-Distributed Stochastic Neighbour Embedding (t-SNE). The method searches for the true phonetic boundaries in the vicinity of those produced by an HMM-based segmentation. It looks for perceptually-salient spectral changes which occur at these phonetic transitions, and exploits t-SNEs ability to capture both local and global structure of the data. The method is intended to be used in any language and it is therefore not tailored to any particular dataset or language. Results show that this simple approach improves segmentation accuracy of unvoiced phonemes by 4% within a 5 ms margin, and 5% at a 10 ms margin. For the voiced phonemes, however, accuracy drops slightly.


2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD) | 2013

Evaluation of sentiment polarity prediction using a dimensional and a categorical approach

Ioana Muresan; Adriana Stan; Mircea Giurgiu; Rodica Potolea

In this paper we evaluate two approaches for predicting the sentiment polarity of an utterance. The first method is based on a 3-dimensional model which takes into account text expressiveness in terms of valence, arousal and dominance. The second one determines the words semantic orientation according to Chi-square and Relevance factor statistic metrics. We describe the general flow of the methods and their extracted features, as well as their predictability potential using different machine learning algorithms, Naïve Bayes, SVM and C4.5. The evaluation is performed on four emotional datasets: Semeval 2007 “Affective Text”, ISEAR (International Survey on Emotional Antecedents and Reactions), childrens fairy-tales and a movie review dataset. The results show a high correlation of the prediction performance with the database content, as well as to the average number of words within the classified text instances.


2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) | 2017

MaRePhoR — An open access machine-readable phonetic dictionary for Romanian

Stefan Adrian Toma; Adriana Stan; Mihai-Lica Pura; Traian Barsan

This paper introduces a novel open access resource, the machine-readable phonetic dictionary for Romanian — MaRePhoR. It contains over 70,000 word entries, and their manually performed phonetic transcription. The paper describes the dictionary format and statistics, as well as an initial use of the phonetic transcription entries by building a grapheme to phoneme converter based on decision trees. Various training strategies were tested enabling the correct selection of a final setup for our predictor. The best results showed that using the dictionary as training data, an accuracy of over 99% can be achieved.

Collaboration


Dive into the Adriana Stan's collaboration.

Top Co-Authors

Avatar

Mircea Giurgiu

Technical University of Cluj-Napoca

View shared research outputs
Top Co-Authors

Avatar

Oliver Watts

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar

Simon King

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar

Junichi Yamagishi

National Institute of Informatics

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Peter Bell

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar

Bogdan Orza

Technical University of Cluj-Napoca

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alexandru Moldovan

Technical University of Cluj-Napoca

View shared research outputs
Researchain Logo
Decentralizing Knowledge