Tiberiu Boros
Romanian Academy
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tiberiu Boros.
conference on computational natural language learning | 2014
Tiberiu Boros; Stefan Daniel Dumitrescu; Adrian Zafiu; Verginica Barbu Mititelu; Ionut Paul Vaduva
This paper describes RACAI’s (Research Institute for Artificial Intelligence) hybrid grammatical error correction system. This system was validated during the participation into the CONLL’14 Shared Task on Grammatical Error Correction. We offer an analysis of the types of errors detected and corrected by our system, we present the necessary steps to reproduce our experiment and also the results we obtained.
management of emergent digital ecosystems | 2015
Tiberiu Boros; Stefan Daniel Dumitrescu
Currently, smartphones and tablets are firmly implanted within our daily lives. These devices have an entire ecosystem devoted to them, with applications and tools designed for their specifications: they use touch-enabled interfaces, have a limited amount of memory and CPU time available for apps (16/32MB limit on Android and iOS devices). A well-established research domain is the development of natural human-computer-interfaces (HCI) via voice and gestures. However, these interfaces are bound by the hardware resources available to them, and by the fact that they use network/Internet access to send/receive data, relying on dedicated servers for the decision making process. This paper focuses on the development of small robust deep-learning models that are designed to provide high quality text-to-speech (TTS) functionality (one of the three main components of HCI) on smart devices, without requiring network access. We obtain very good results in TTS text sub-tasks using models significantly smaller than those used in state-of-the-art approaches.
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies : August 3-4, 2017 Vancouver, Canada, 2017, ISBN 978-1-945626-70-8, págs. 174-181 | 2017
Stefan Daniel Dumitrescu; Tiberiu Boros; Dan Tufis
This paper presents RACAI’s approach, experiments and results at CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. We handle raw text and we cover tokeniza tion, sentence splitting, word segmentation, tagging, lemmatization and parsing. All results are reported under strict train- ing, development and testing conditions, in which the corpora provided for the shared tasks is used “as is”, without any modifications to the composition of the train and development sets
international conference on engineering applications of neural networks | 2013
Tiberiu Boros; Stefan Daniel Dumitrescu
Part-of-speech (POS) tagging is a key process for various natural language processing related tasks, in which each word of a sentence is assigned a uniquely interpretable label (called a POS tag). There are many proposed methodologies for this task, such as Hidden Markov Models, Conditional Random Fields, Maximum Entropy classifiers etc. Such methods are primarily intended for English which, in comparison to highly inflectional languages has a relatively small tagset inventory. One of the well-known methods used for large tagset labeling (referred to as morpho-syntactic descriptors or MSDs) is called Tiered Tagging (Tufis, 1999), (Tufis and Dragomirescu, 2006) and it exploits a reduced set of tags from which context irrelevant features (e.g. gender information) which can be deduced trough the word form’s flectional analysis are stripped. In our previous work we presented an alternative method to Tiered Tagging, in which we performed multi-class classification with a feed-forward neural network. Our methodology has the advantage that it does not require extensive linguistic knowledge as implied by the previously mentioned approach. We extend our work by testing our tool on Czech and successfully experimenting with a genetic algorithm designed to find a better network topology.
2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD) | 2013
Dan Tufis; Tiberiu Boros; Stefan Daniel Dumitrescu
Recent advances in Multilingual Machine Translation and in Speech Processing, coupled with the unprecedented computing power increase of mobile devices, served by faster communication means, made possible the implementation of operational Speech to Speech (S2S) translation systems on smart phones and tablets. Through S2S, a text spoken in one language is automatically recognized, translated and synthesized in another language. This article presents an overview of the first version of our Android-based Romanian-English bi-directional speech translation system and covers the methods and technologies used for implementing it. To the best of our knowledge, this is the first bidirectional S2S for Romanian-English implemented on mobile devices.
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017) | 2017
Tiberiu Boros; Sonia Pipa; Verginica Barbu Mititelu; Dan Tufis
Multiword expressions are groups of words acting as a morphologic, syntactic and semantic unit in linguistic analysis. Verbal multiword expressions represent a subgroup of multiword expressions, namely that in which a verb is the syntactic head of the group considered in its canonical (or dictionary) form. All multiword expressions are a great challenge for natural language processing, but the verbal ones are particularly interesting for tasks such as parsing, as the verb is the central element in the syntactic organization of a sentence. In this paper we introduce our data-driven approach to verbal multiword expressions, which was objectively validated during the PARSEME shared task on verbal multiword expressions identification. We tested our approach on 12 languages, and we provide detailed information about corpora composition, feature selection process, validation procedure and performance on all languages.
recent advances in natural language processing | 2017
Tiberiu Boros; Stefan Daniel Dumitrescu; Sonia Pipa
Decision trees have been previously employed in many machine-learning tasks such as part-of-speech tagging, lemmatization, morphological-attribute resolution, letter-to-sound conversion and statistical-parametric speech synthesis. In this paper we introduce an optimized tree-computation algorithm, which is based on the original ID3 algorithm. We also introduce a tree-pruning method that uses a development set to delete nodes from over-fitted models. The later mentioned algorithm also uses a results caching method for speed-up. Our algorithm is almost 200 times faster than a naive implementation and yields accurate results on our test datasets.
international conference on engineering applications of neural networks | 2017
Tiberiu Boros; Stefan Daniel Dumitrescu
We introduce a convolutional network architecture aimed at performing token-level processing in natural language applications. We tune this architecture for a specific task - multiword expression detection - and we compare our results to state-of-the-art systems on the same datasets. The approach is multilingual and we rely on automatically extracted word embeddings from Wikipedia dumps. We also show that task-driven lexical features embeddings increase the speed and robustness of the system versus sparse encodings.
2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) | 2017
Tiberiu Boros; Stefan Daniel Dumitrescu
This paper describes a data-driven approach to handling natural language interaction between humans and devices. This approach enables example-based definition and tuning of interaction scenarios. Actions and parameters can be easily configured, requiring no prior knowledge of natural language processing and no previous experience with this type of systems. The platform requires a small amount of language-dependent resources, making this approach ideal for creating multilingual natural language interfaces.
language and technology conference | 2015
Tiberiu Boros; Stefan Daniel Dumitrescu
This work focuses on morphological analysis of raw text and provides a recipe for tokenization, sentence splitting and part-of-speech tagging for all languages included in the Universal Dependencies Corpus. Scalability is an important issue when dealing with large-sized multilingual corpora. The experiments include both lightweight classifiers (linear and decision trees) and heavyweight LSTM-based architectures which are able to attain state-of-the-art results. All the experiments are carried out using the provided data “as-is”. We apply lightweight and heavyweight classifiers on 5 distinct tasks, on multiple languages; we present some lessons learned during the training process; we look at per-language results as well as task averages, we present model footprints, and finally draw a few conclusions regarding trade-offs between the classifiers’ characteristics.