Is this you? Create Your Porfile

Dragos Burileanu

Politehnica University of Bucharest

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dragos Burileanu is active.

Explore More

Publication

Featured researches published by Dragos Burileanu.

International Journal of Speech Technology | 2002

Basic Research and Implementation Decisions for a Text-to-Speech Synthesis System in Romanian

Dragos Burileanu

Speech synthesis is one of the most language-dependent domains of speech technology. In particular, the natural language processing stage of a text-to-speech (TTS) system contains the largest part of the linguistic knowledge for a given language. In this respect, one can state that building a high-quality TTS system for a new language involves many theoretical and technical challenges. Especially, extensive studies must exist (or be done) at the linguistic level, in order to endow the system with the most relevant language information; this requirement represents an essential condition to obtain a true naturalness of the synthesized speech, starting from unrestricted input text. This paper presents fundamental research and the related implementation issues in developing a complete TTS system in Romanian, emphasizing the language particularities and their influence on improving the language processing stage efficiency. The first section describes our standpoint on TTS synthesis as well as the overall architecture of our TTS system. The next sections formulate several important tasks of the natural language processing stage (input text preprocessing, letter-to-phone conversion, acoustic database preparation) and discuss the design philosophy of the corresponding modules, implementation decisions and evaluation experiments. A distinct section is devoted to an acoustic-phonetic study that assisted the phone-set selection and acoustic database generation. The paper ends with conclusions and a description of the work that is currently in progress at other levels of the TTS system.

text speech and dialogue | 2000

An Adaptive and Fast Speech Detection Algorithm

Dragos Burileanu; Lucian Pascalin; Corneliu Burileanu; Mihai Puchiu

The detection of speech from silence (actually background noise) is essential in many speech-processing systems.In real-field applications, the correct determination of voice segments highly improves the overall system accuracy and minimises the total computation time.This paper1 presents a novel robust and reliable speech detection algorithm to be used in a speaker recognition system. The paper first introduces some basic concepts on speech activity detection and reviews the techniques currently used in speech detection tasks.Then, the proposed speech/non-speech detection algorithm is described and experimental results are discussed. Conclusions about the algorithm performances are finally presented.

2011 6th Conference on Speech Technology and Human-Computer Dialogue (SpeD) | 2011

An advanced NLP framework for high-quality Text-to-Speech synthesis

Catalin Ungurean; Dragos Burileanu

In order to build a TTS (Text-to-Speech) synthesis system one must provide two key components: a NLP (Natural Language Processing) stage, which essentially operates on the input text, and a speech generation stage to produce the desired output. These two distinct levels must exchange both data and commands to produce intelligible and natural speech. As the complete TTS task relies on many distinct scientific areas, any achievement toward standardization can minimize the effort and increase the dynamic of the results. This paper gives an overview of the NLP stage in the TTS system for Romanian language built by our collective, and describes the integration into the system of SSML (Speech Synthesis Markup Language), as a nowadays well recognized standard for TTS document authoring and inter-modules communication.

International Journal of Speech Technology | 2009

A statistical approach to lexical stress assignment for TTS synthesis

Catalin Ungurean; Dragos Burileanu; Aurelian Dervis

Lexical stress is primarily important to generate a correct pronunciation of words in many languages; hence its correct placement is a major task in prosody prediction and generation for high-quality TTS (text-to-speech) synthesis systems. This paper proposes a statistical approach to lexical stress assignment for TTS synthesis in Romanian. The method is essentially based on n-gram language models at character level, and uses a modified Katz backoff smoothing technique to solve the problem of data sparseness during training. Monosyllabic words are considered as not carrying stress, and are separated by an automatic syllabification algorithm. A maximum accuracy of 99.11% was obtained on a test corpus of about 47,000 words.

Archive | 2008

Spoken Language Interfaces for Embedded Applications

Dragos Burileanu

Speech-enabled interfaces have been increasingly appearing in small devices, such as cellular phones, PDAs, car kits, and various other consumer electronics products, resulting is what is now being called “embedded speech.” The new generation of small-scale computing devices has severe resource constraints, notably low CPU resources and small memory footprints. This makes the design and efficient implementation of speech interfaces for these devices a challenging task. This chapter discusses first the evolution of spoken language interfaces and evaluates their potential benefits for embedded applications. The basic requirements for these kinds of interfaces and the inherent restrictions imposed by low-resource systems are investigated. Then, the chapter analyzes current theoretical and practical solutions in adapting speech recognition and synthesis technologies to portable electronic devices. As a concrete example, implementation issues in developing an optimized embedded version of a complete text-to-speech synthesis system are described.

2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) | 2015

Speech database acquisition for assisted living environment applications

Mihai Dogariu; Horia Cucu; Andi Buzo; Dragos Burileanu; Octavian Fratu

Home automation has become a subject of increasing interest for both industry and research as there is an increase in the awareness of such systems and their benefits can be easily seen. The new trend is to develop smart homes where commands can be given by speech. This way of communication, besides being the most natural, has the advantage of offering flexibility to the users especially when they have limited motion capabilities. As for widely used languages the state of the art has achieved an important level of performance, little efforts are made with the Romanian language. The main reason for this is the lack of an annotated speech database from real life conditions. This paper focuses on the methodology of acquiring four different speech corpora with various end-user scenarios in mind. The commands corpus is meant to be used in home automation development, the cough corpus is meant to help research in detecting distress situations, the spontaneous speech corpus will aid in distant speech recognition applications and the multi-room, multi-person, multi-language corpus can be used for research in speaker detection and identification. All these were recorded in the context of a completely automated and functional smart home. The small number of such environments available to the public makes these corpora valuable from experimental point of view.

2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD) | 2013

On forensic speaker recognition case pre-assessment

Gheorghe Pop; Dragos Draghicescu; Dragos Burileanu

Early forensic audio techniques were difficult to explain in court. Their applications received acceptance as well as rejection in alternate waves, determined by the rhythm judiciary and scientific communities have made their progressive steps. Time and court debates had to filter their available solutions by practically assessing both analysis techniques and evidence. Nowadays, forensic scientists consider using only coherent approaches that are balanced, logical, transparent and robust while still easy to explain to the intended audience. Forensic experts say interpretation of evidence needs to be done beforehand of analysis, for the client to have the necessary prognosis of the value of their submitted evidence. Through pre-assessment, the forensic laboratories obtain efficiency increases, on a basis of preliminary evidence relevance classification. The paper discusses and compares different objective criteria for case pre-assessment in forensic speaker recognition, based on the estimated quality parameters of the speech recording.

international conference on acoustics, speech, and signal processing | 2004

An optimized TTS system implementation using a Motorola StarCore SC140-based processor

Dragos Burileanu; Andrei Fecioru; Dragos Ion; Madalin Stoica; Costel Ilas

One of the key technologies for spoken language processing is the automatic synthesis of speech. For an important number of current or future applications (including various telecommunication services and voice interfaces for mobile devices), the synthesis of good quality speech starting from unrestricted text and the efficient implementation of the corresponding synthesis systems represent very difficult tasks. The paper presents an optimized implementation of a text-to-speech (TTS) synthesis system for the Romanian language using a Motorola development platform built around a StarCore SC140-based processor. The paper emphasizes the key requirements for such an embedded implementation (especially the intelligibility/footprint combination), the problems that were encountered and the solutions found to these problems.

2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) | 2017

Fast method for ENF database build and search

Gheorghe Pop; Dragos Draghicescu; Dragos Burileanu; Horia Cucu; Corneliu Burileanu

The field of digital audio forensics has been driving a sustained research effort in the last decade. Current digital audio authentication frameworks include Electric Network Frequency (ENF) criterion as a must. The ENF-based techniques benefit greatly from the availability of reference databases, which are built using extraction mechanisms that continuously analyze the power line signal. To find the recording time of an ENF-carrying audio, the frequency sequence extracted from the file is matched against a reference database. A database collection method based on spectral analysis needs to trade time resolution for frequency resolution. This tradeoff usually leads to databases with less variance in the frequency series. In this paper we present a method to efficiently build an ENF database, with good time resolution, reduced storage requirements, and a fast two-step search procedure.

2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) | 2017

Building a representative audio base of syllables for Romanian language

Stefan Stelian Diaconescu; Monica Mihaela Rizea; Mihaela Ionescu; Andrei Minca; Liviu Dorobantu; Stefan Fulea; Monica Radulescu; Horia Cucu; Dragos Burileanu

The aim of this work is to provide some insights regarding the effort of building a representative and wide coverage audio base of syllables for Romanian. The audio base comprises audio recordings of syllables extracted from the following types of syllable embedding: isolated-syllable, isolated-word and continuous speech. The list of syllables has been computed over the syllabified form of single-word inflected forms. The inflected forms were generated using a general rule-based system for normal and phonetic inflection having at its core the GRAALAN (GRAmmar Abstract LANguage) metalanguage (designed for linguistic knowledge description). In addition, the word-position of a syllable was accounted for when planning the audio recordings.

Explore More