Tiberiu Boros | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tiberiu Boros is active.

Explore More

Publication

Featured researches published by Tiberiu Boros.

conference on computational natural language learning | 2014

RACAI GEC -- A hybrid approach to Grammatical Error Correction

Tiberiu Boros; Stefan Daniel Dumitrescu; Adrian Zafiu; Verginica Barbu Mititelu; Ionut Paul Vaduva

This paper describes RACAI’s (Research Institute for Artificial Intelligence) hybrid grammatical error correction system. This system was validated during the participation into the CONLL’14 Shared Task on Grammatical Error Correction. We offer an analysis of the types of errors detected and corrected by our system, we present the necessary steps to reproduce our experiment and also the results we obtained.

management of emergent digital ecosystems | 2015

Robust deep-learning models for text-to-speech synthesis support on embedded devices

Tiberiu Boros; Stefan Daniel Dumitrescu

Currently, smartphones and tablets are firmly implanted within our daily lives. These devices have an entire ecosystem devoted to them, with applications and tools designed for their specifications: they use touch-enabled interfaces, have a limited amount of memory and CPU time available for apps (16/32MB limit on Android and iOS devices). A well-established research domain is the development of natural human-computer-interfaces (HCI) via voice and gestures. However, these interfaces are bound by the hardware resources available to them, and by the fact that they use network/Internet access to send/receive data, relying on dedicated servers for the decision making process. This paper focuses on the development of small robust deep-learning models that are designed to provide high quality text-to-speech (TTS) functionality (one of the three main components of HCI) on smart devices, without requiring network access. We obtain very good results in TTS text sub-tasks using models significantly smaller than those used in state-of-the-art approaches.

Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies : August 3-4, 2017 Vancouver, Canada, 2017, ISBN 978-1-945626-70-8, págs. 174-181 | 2017

RACAI's Natural Language Processing pipeline for Universal Dependencies.

Stefan Daniel Dumitrescu; Tiberiu Boros; Dan Tufis

This paper presents RACAI’s approach, experiments and results at CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. We handle raw text and we cover tokeniza tion, sentence splitting, word segmentation, tagging, lemmatization and parsing. All results are reported under strict train- ing, development and testing conditions, in which the corpora provided for the shared tasks is used “as is”, without any modifications to the composition of the train and development sets

international conference on engineering applications of neural networks | 2013

Improving the RACAI Neural Network MSD Tagger

Tiberiu Boros; Stefan Daniel Dumitrescu

Part-of-speech (POS) tagging is a key process for various natural language processing related tasks, in which each word of a sentence is assigned a uniquely interpretable label (called a POS tag). There are many proposed methodologies for this task, such as Hidden Markov Models, Conditional Random Fields, Maximum Entropy classifiers etc. Such methods are primarily intended for English which, in comparison to highly inflectional languages has a relatively small tagset inventory. One of the well-known methods used for large tagset labeling (referred to as morpho-syntactic descriptors or MSDs) is called Tiered Tagging (Tufis, 1999), (Tufis and Dragomirescu, 2006) and it exploits a reduced set of tags from which context irrelevant features (e.g. gender information) which can be deduced trough the word form’s flectional analysis are stripped. In our previous work we presented an alternative method to Tiered Tagging, in which we performed multi-class classification with a feed-forward neural network. Our methodology has the advantage that it does not require extensive linguistic knowledge as implied by the previously mentioned approach. We extend our work by testing our tool on Czech and successfully experimenting with a genetic algorithm designed to find a better network topology.

2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD) | 2013

The RACAI speech translation system challenges of morphologically rich languages

Dan Tufis; Tiberiu Boros; Stefan Daniel Dumitrescu

Recent advances in Multilingual Machine Translation and in Speech Processing, coupled with the unprecedented computing power increase of mobile devices, served by faster communication means, made possible the implementation of operational Speech to Speech (S2S) translation systems on smart phones and tablets. Through S2S, a text spoken in one language is automatically recognized, translated and synthesized in another language. This article presents an overview of the first version of our Android-based Romanian-English bi-directional speech translation system and covers the methods and technologies used for implementing it. To the best of our knowledge, this is the first bidirectional S2S for Romanian-English implemented on mobile devices.

Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017) | 2017

A data-driven approach to verbal multiword expression detection. PARSEME Shared Task system description paper

Tiberiu Boros; Sonia Pipa; Verginica Barbu Mititelu; Dan Tufis

Multiword expressions are groups of words acting as a morphologic, syntactic and semantic unit in linguistic analysis. Verbal multiword expressions represent a subgroup of multiword expressions, namely that in which a verb is the syntactic head of the group considered in its canonical (or dictionary) form. All multiword expressions are a great challenge for natural language processing, but the verbal ones are particularly interesting for tasks such as parsing, as the verb is the central element in the syntactic organization of a sentence. In this paper we introduce our data-driven approach to verbal multiword expressions, which was objectively validated during the PARSEME shared task on verbal multiword expressions identification. We tested our approach on 12 languages, and we provide detailed information about corpora composition, feature selection process, validation procedure and performance on all languages.

recent advances in natural language processing | 2017

Fast and Accurate Decision Trees for Natural Language Processing Tasks.

Tiberiu Boros; Stefan Daniel Dumitrescu; Sonia Pipa

Decision trees have been previously employed in many machine-learning tasks such as part-of-speech tagging, lemmatization, morphological-attribute resolution, letter-to-sound conversion and statistical-parametric speech synthesis. In this paper we introduce an optimized tree-computation algorithm, which is based on the original ID3 algorithm. We also introduce a tree-pruning method that uses a development set to delete nodes from over-fitted models. The later mentioned algorithm also uses a results caching method for speed-up. Our algorithm is almost 200 times faster than a naive implementation and yields accurate results on our test datasets.

international conference on engineering applications of neural networks | 2017

A Convolutional Approach to Multiword Expression Detection Based on Unsupervised Distributed Word Representations and Task-Driven Embedding of Lexical Features

Tiberiu Boros; Stefan Daniel Dumitrescu

We introduce a convolutional network architecture aimed at performing token-level processing in natural language applications. We tune this architecture for a specific task - multiword expression detection - and we compare our results to state-of-the-art systems on the same datasets. The approach is multilingual and we rely on automatically extracted word embeddings from Wikipedia dumps. We also show that task-driven lexical features embeddings increase the speed and robustness of the system versus sparse encodings.

2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) | 2017

A “small-data”-driven approach to dialogue systems for natural language human computer interaction

Tiberiu Boros; Stefan Daniel Dumitrescu

This paper describes a data-driven approach to handling natural language interaction between humans and devices. This approach enables example-based definition and tuning of interaction scenarios. Actions and parameters can be easily configured, requiring no prior knowledge of natural language processing and no previous experience with this type of systems. The platform requires a small amount of language-dependent resources, making this approach ideal for creating multilingual natural language interfaces.

language and technology conference | 2015

Multilingual Tokenization and Part-of-speech Tagging. Lightweight Versus Heavyweight Algorithms

Tiberiu Boros; Stefan Daniel Dumitrescu

This work focuses on morphological analysis of raw text and provides a recipe for tokenization, sentence splitting and part-of-speech tagging for all languages included in the Universal Dependencies Corpus. Scalability is an important issue when dealing with large-sized multilingual corpora. The experiments include both lightweight classifiers (linear and decision trees) and heavyweight LSTM-based architectures which are able to attain state-of-the-art results. All the experiments are carried out using the provided data “as-is”. We apply lightweight and heavyweight classifiers on 5 distinct tasks, on multiple languages; we present some lessons learned during the training process; we look at per-language results as well as task averages, we present model footprints, and finally draw a few conclusions regarding trade-offs between the classifiers’ characteristics.

Explore More