Alina Maria Ciobanu
University of Bucharest
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alina Maria Ciobanu.
conference of the european chapter of the association for computational linguistics | 2014
Vlad Niculae; Marcos Zampieri; Liviu P. Dinu; Alina Maria Ciobanu
This paper presents a novel approach to the task of temporal text classification combining text ranking and probability for the automatic dating of historical texts. The method was applied to three historical corpora: an English, a Portuguese and a Romanian corpus. It obtained performance ranging from 83% to 93% accuracy, using a fully automated approach with very basic features.
meeting of the association for computational linguistics | 2014
Alina Maria Ciobanu; Liviu P. Dinu
Words undergo various changes when entering new languages. Based on the assumption that these linguistic changes follow certain rules, we propose a method for automatically detecting pairs of cognates employing an orthographic alignment method which proved relevant for sequence alignment in computational biology. We use aligned subsequences as features for machine learning algorithms in order to infer rules for linguistic changes undergone by words when entering new languages and to discriminate between cognates and non-cognates. Given a list of known cognates, our approach does not require any other linguistic information. However, it can be customized to integrate historical information regarding language evolution.
empirical methods in natural language processing | 2014
Alina Maria Ciobanu; Liviu P. Dinu
In this paper we propose a computational method for determining the orthographic similarity between Romanian and related languages. We account for etymons and cognates and we investigate not only the number of related words, but also their forms, quantifying orthographic similarities. The method we propose is adaptable to any language, as far as resources are available.
north american chapter of the association for computational linguistics | 2015
Marcos Zampieri; Alina Maria Ciobanu; Vlad Niculae; Liviu P. Dinu
This paper describes the AMBRA system, entered in the SemEval-2015 Task 7: ‘Diachronic Text Evaluation’ subtasks one and two, which consist of predicting the date when a text was originally written. The task is valuable for applications in digital humanities, information systems, and historical linguistics. The novelty of this shared task consists of incorporating label uncertainty by assigning an interval within which the document was written, rather than assigning a clear time marker to each training document. To deal with non-linear effects and variable degrees of uncertainty, we reduce the problem to pairwise comparisons of the form is Document A older than Document B?, and propose a nonparametric way to transform the ordinal output into time intervals.
international joint conference on natural language processing | 2015
Alina Maria Ciobanu; Liviu P. Dinu
Identifying the type of relationship between words provides a deeper insight into the history of a language and allows a better characterization of language relatedness. In this paper, we propose a computational approach for discriminating between cognates and borrowings. We show that orthographic features have discriminative power and we analyze the underlying linguistic factors that prove relevant in the classification task. To our knowledge, this is the first attempt of this kind.
conference of the european chapter of the association for computational linguistics | 2014
Alina Maria Ciobanu; Anca Dinu; Liviu P. Dinu
We train and evaluate two models for Romanian stress prediction: a baseline model which employs the consonant-vowel structure of the words and a cascaded model with averaged perceptron training consisting of two sequential models ‐ one for predicting syllable boundaries and another one for predicting stress placement. We show in this paper that Romanian stress is predictable, though not deterministic, by using data-driven machine learning techniques.
Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR) | 2014
Alina Maria Ciobanu; Liviu P. Dinu
In this paper we investigate the impact of translation on readability. We propose a quantitative analysis of several shallow, lexical and morpho-syntactic features that have been traditionally used for assessing readability and have proven relevant for this task. We conduct our experiments on a parallel corpus of transcribed parliamentary sessions and we investigate readability metrics for the original segments of text, written in the language of the speaker, and their translations.
Procedia Computer Science | 2016
Alina Maria Ciobanu
We propose a sequence labeling approach to cognate production based on the orthography of the words. Our approach leverages the idea that orthographic changes represent sound correspondences to a fairly large extent. Given an input word in language L1, we seek to determine its cognate pair in language L2. To this end, we employ a sequential model which captures the intuition that orthographic changes are highly dependent on the context in which they occur. We apply our method on two pairs of languages. Finally, we investigate how second language learners perceive the orthographic changes from their mother tongue to the language they learn.
Archive | 2018
Liviu P. Dinu; Alina Maria Ciobanu
Languages borrow words from one another for various reasons. How the borrowing process takes place, how new words enter a recipient language are key questions of historical linguistics. In this paper, we propose a multilingual method for word form production based on the orthography of the words. For borrowed words, we investigate the derivation from a donor language into a recipient language. We also address the problem of genetic cognates derivation. We experiment with Romanian as a recipient language and we investigate borrowings from multiple donor languages. The advantages of the proposed method are that it does not use any external knowledge, except for the training word pairs, and it does not require the phonetic transcriptions of the input words.
international conference on computational linguistics | 2017
Alina Maria Ciobanu; Liviu P. Dinu; Andrea Sgarro
In this paper we propose a computational method for determining the syntactic similarity between languages. We investigate multiple approaches and metrics, showing that the results are consistent across methods. We report results on 16 languages belonging to various language families. The analysis that we conduct is adaptable to any languages, as far as resources are available.