Anca Dinu
University of Bucharest
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anca Dinu.
international conference on computational linguistics | 2005
Anca Dinu; Liviu P. Dinu
In this paper we study the syllabic similarity between Romance languages via rank distance. The results confirm the linguistical theories, bringing a plus of quantification and rigor.
international conference on computational linguistics | 2005
Anca Dinu; Liviu P. Dinu
In this paper we propose a parallel manner of syllabification introducing some parallel extensions of insertion grammars. We use this grammars in an application to Romanian language syllabification.
symbolic and numeric algorithms for scientific computing | 2011
Anca Dinu
In this article we propose a quantitative approach to a relatively new problem: categorizing text as pragmatically correct or pragmatically incorrect (forcing the notion, coherent/incoherent). The typical text categorization criterions comprise categorization by topic, by style (genre classification, authorship identification), by expressed opinion (opinion mining, sentiment classification), etc. Very few approaches consider the problem of categorizing text by degree of coherence. One example of application of text categorization by its coherence is creating a spam filter for personal e-mail accounts able to cope with one of the new strategies adopted by spamers. This strategy consists of encoding the real message as picture (impossible to directly analyze and reject by the text oriented classical filters) and accompanying it by a text especially designed to surpass the filter. An important question for automatically categorizing texts into coherent and incoherent is: are there features that can be extracted from these texts and be successfully used to categorize them? We propose a quantitative approach that relies on the use of ratios between morphological categories from the texts as discriminant features. We use supervised machine learning techniques on a small corpus of English e-mail messages and let the algorithms extract important features from all the pos ratios. The results are encouraging.
Fundamenta Informaticae | 2011
Anca Dinu
We show in this paper how the computer science concept of ‘continuations’, together with categorial grammars and a type shifting mechanism, is able to account for a wide range of natural language semantic phenomena, such as hierarchical discourse structure, ellipses, accommodation and free-focus and bound-focus anaphora. The merit of continuations in the dynamic semantics framework is that they abstract away from assignment functions that are essential to the formulations of Dynamic Intensional Logic, Dynamic Montague Grammar, Dynamic Predicate Logic and Discourse Representation Theory, Thus, continuation style semantic do not pose problems such as the destructive assignment problem in Dynamic Predicate Logic or the variable clash problem in Discourse Representation Theory. We argue that continuations are a versatile and powerful tool, particularly well suited to manipulate scope and long distance dependencies, phenomena that abound in natural language semantics.
conference of the european chapter of the association for computational linguistics | 2014
Alina Maria Ciobanu; Anca Dinu; Liviu P. Dinu
We train and evaluate two models for Romanian stress prediction: a baseline model which employs the consonant-vowel structure of the words and a cascaded model with averaged perceptron training consisting of two sequential models ‐ one for predicting syllable boundaries and another one for predicting stress placement. We show in this paper that Romanian stress is predictable, though not deterministic, by using data-driven machine learning techniques.
Proceedings of the Workshop on Linguistic Distances | 2006
Anca Dinu; Liviu P. Dinu
In this paper we propose two metrics to be used in various fields of computational linguistics area. Our construction is based on the supposition that in most of the natural languages the most important information is carried by the first part of the unit. We introduce total rank distance and scaled total rank distance, we prove that they are metrics and investigate their max and expected values. Finally, a short application is presented: we investigate the similarity of Romance languages by computing the scaled total rank distance between the digram rankings of each language.
international conference information processing | 2018
Anca Dinu; Liviu P. Dinu; Laura Franzoi; Andrea Sgarro
In this paper we deal with distances for fuzzy strings in \([0,1]^n\), to be used in distance-based linguistic classification. We start from the fuzzy Hamming distance, anticipated by the linguist Muljacic back in 1967, and the taxicab distance, which both generalize the usual crisp Hamming distance, using in the first case the standard logical operations of minimum for conjunctions and maximum for disjunctions, while in the second case one uses Łukasiewicz’ T-norms and T-conorms. We resort to the Steinhaus transform, a powerful tool which allows one to deal with linguistic data which are not only fuzzy, but possibly also irrelevant or logically inconsistent. Experimental results on actual data are shown and preliminarily commented upon.
recent advances in natural language processing | 2017
Anca Dinu; Liviu P. Dinu; Bogdan Dumitru
In this article we propose a stylistic analysis of Solomon Marcus’ non-scientific published texts, gathered in six volumes, aiming to uncover some of his quantitative and qualitative fingerprints. Moreover, we compare and cluster two distinct periods of time in his writing style: 22 years of communist regime (1967-1989) and 27 years of democracy (1990-2016). The distributional analysis of Marcus’ text reveals that the passing from the communist regime period to democracy is sharply marked by two complementary changes in Marcus’ writing: in the pre-democracy period, the communist norms of writing style demanded on the one hand long phrases, long words and cliches, and on the other hand, a short list of preferred “official” topics; in democracy tendency was towards shorten phrases and words while approaching a broader area of topics.
language resources and evaluation | 2008
Liviu P. Dinu; Marius Popescu; Anca Dinu
sighum workshop on language technology for cultural heritage social sciences and humanities | 2013
Alina Maria Ciobanu; Anca Dinu; Liviu P. Dinu; Vlad Niculae; Octavia-Maria Şulea