Stefan Daniel Dumitrescu
Romanian Academy
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Stefan Daniel Dumitrescu.
conference on computational natural language learning | 2014
Tiberiu Boros; Stefan Daniel Dumitrescu; Adrian Zafiu; Verginica Barbu Mititelu; Ionut Paul Vaduva
This paper describes RACAI’s (Research Institute for Artificial Intelligence) hybrid grammatical error correction system. This system was validated during the participation into the CONLL’14 Shared Task on Grammatical Error Correction. We offer an analysis of the types of errors detected and corrected by our system, we present the necessary steps to reproduce our experiment and also the results we obtained.
management of emergent digital ecosystems | 2015
Tiberiu Boros; Stefan Daniel Dumitrescu
Currently, smartphones and tablets are firmly implanted within our daily lives. These devices have an entire ecosystem devoted to them, with applications and tools designed for their specifications: they use touch-enabled interfaces, have a limited amount of memory and CPU time available for apps (16/32MB limit on Android and iOS devices). A well-established research domain is the development of natural human-computer-interfaces (HCI) via voice and gestures. However, these interfaces are bound by the hardware resources available to them, and by the fact that they use network/Internet access to send/receive data, relying on dedicated servers for the decision making process. This paper focuses on the development of small robust deep-learning models that are designed to provide high quality text-to-speech (TTS) functionality (one of the three main components of HCI) on smart devices, without requiring network access. We obtain very good results in TTS text sub-tasks using models significantly smaller than those used in state-of-the-art approaches.
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies : August 3-4, 2017 Vancouver, Canada, 2017, ISBN 978-1-945626-70-8, págs. 174-181 | 2017
Stefan Daniel Dumitrescu; Tiberiu Boros; Dan Tufis
This paper presents RACAI’s approach, experiments and results at CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. We handle raw text and we cover tokeniza tion, sentence splitting, word segmentation, tagging, lemmatization and parsing. All results are reported under strict train- ing, development and testing conditions, in which the corpora provided for the shared tasks is used “as is”, without any modifications to the composition of the train and development sets
2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) | 2017
Stefan Daniel Dumitrescu
This paper presents the architecture and technologies used to develop a voice controlled system for home automation named Cassandra. We start with the goals of the project and a system description, then focusing on the main components and the way they interact with each other. We exemplify with a scenario where we ask the house to turn the lights off, going step-by-step over the communication sequence between the modules involved. The purpose of this paper is for the reader to get an overview of how the system works, without going deep into technical details.
international conference on engineering applications of neural networks | 2013
Tiberiu Boros; Stefan Daniel Dumitrescu
Part-of-speech (POS) tagging is a key process for various natural language processing related tasks, in which each word of a sentence is assigned a uniquely interpretable label (called a POS tag). There are many proposed methodologies for this task, such as Hidden Markov Models, Conditional Random Fields, Maximum Entropy classifiers etc. Such methods are primarily intended for English which, in comparison to highly inflectional languages has a relatively small tagset inventory. One of the well-known methods used for large tagset labeling (referred to as morpho-syntactic descriptors or MSDs) is called Tiered Tagging (Tufis, 1999), (Tufis and Dragomirescu, 2006) and it exploits a reduced set of tags from which context irrelevant features (e.g. gender information) which can be deduced trough the word form’s flectional analysis are stripped. In our previous work we presented an alternative method to Tiered Tagging, in which we performed multi-class classification with a feed-forward neural network. Our methodology has the advantage that it does not require extensive linguistic knowledge as implied by the previously mentioned approach. We extend our work by testing our tool on Czech and successfully experimenting with a genetic algorithm designed to find a better network topology.
2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD) | 2013
Dan Tufis; Tiberiu Boros; Stefan Daniel Dumitrescu
Recent advances in Multilingual Machine Translation and in Speech Processing, coupled with the unprecedented computing power increase of mobile devices, served by faster communication means, made possible the implementation of operational Speech to Speech (S2S) translation systems on smart phones and tablets. Through S2S, a text spoken in one language is automatically recognized, translated and synthesized in another language. This article presents an overview of the first version of our Android-based Romanian-English bi-directional speech translation system and covers the methods and technologies used for implementing it. To the best of our knowledge, this is the first bidirectional S2S for Romanian-English implemented on mobile devices.
management of emergent digital ecosystems | 2013
Stefan Daniel Dumitrescu; Stefan Trausan-Matu; Mihaela Brut; Florence Sèdes
The paper presents a solution to the problem of capitalizing in different contexts and by different stakeholders the time-stamped new documents produced by social Web sites (including news, blog entries, and uploaded documents). The solution core includes an ontology-based method to express the interest topics and to automatically classify them. For such textual content obtained in real-time, we propose an unsupervised text classification system based on general YAGO ontology, graph algorithms and a custom scoring method. The system shows good performance using only ontology information and the ontology structure itself. We compare our system against a SVM-based (Support Vector Machine) classic text classification approach. For determining the relevance of a specific document for a specific topic, our approach develops and compares the ontology sub graphs corresponding to the query and to the document. It leads to a high flexibility in terms of capitalizing the already classified documents when refining and changing the interest topic: a graph-based matching of the already obtained ontology-based document representation against the new query representation is enough to assess the document relevance.
recent advances in natural language processing | 2017
Tiberiu Boros; Stefan Daniel Dumitrescu; Sonia Pipa
Decision trees have been previously employed in many machine-learning tasks such as part-of-speech tagging, lemmatization, morphological-attribute resolution, letter-to-sound conversion and statistical-parametric speech synthesis. In this paper we introduce an optimized tree-computation algorithm, which is based on the original ID3 algorithm. We also introduce a tree-pruning method that uses a development set to delete nodes from over-fitted models. The later mentioned algorithm also uses a results caching method for speed-up. Our algorithm is almost 200 times faster than a naive implementation and yields accurate results on our test datasets.
international conference on engineering applications of neural networks | 2017
Tiberiu Boros; Stefan Daniel Dumitrescu
We introduce a convolutional network architecture aimed at performing token-level processing in natural language applications. We tune this architecture for a specific task - multiword expression detection - and we compare our results to state-of-the-art systems on the same datasets. The approach is multilingual and we rely on automatically extracted word embeddings from Wikipedia dumps. We also show that task-driven lexical features embeddings increase the speed and robustness of the system versus sparse encodings.
2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) | 2017
Tiberiu Boros; Stefan Daniel Dumitrescu
This paper describes a data-driven approach to handling natural language interaction between humans and devices. This approach enables example-based definition and tuning of interaction scenarios. Actions and parameters can be easily configured, requiring no prior knowledge of natural language processing and no previous experience with this type of systems. The platform requires a small amount of language-dependent resources, making this approach ideal for creating multilingual natural language interfaces.