Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alexey Sorokin is active.

Publication


Featured researches published by Alexey Sorokin.


foundations of computer science | 2013

Normal Forms for Multiple Context-Free Languages and Displacement Lambek Grammars

Alexey Sorokin

We introduce a new grammar formalism, the displacement context-free grammars, which is equivalent to well-nested multiple context-free grammars. We generalize the notions of Chomsky and Greibach normal forms for these grammars and show that every language without the empty word generated by a displacement context-free grammar can be also generated by displacement Lambek grammars.


Student Sessions at the European Summer School in Logic, Language and Information | 2013

Monoid Automata for Displacement Context-Free Languages

Alexey Sorokin

In 2007 Kambites presented an algebraic interpretation of Chomsky-Schutzenberger theorem for context-free languages. We solve an analogous task for the class of displacement context-free languages which are equivalent to well-nested multiple context-free languages giving an interpretation of the corresponding theorem for that class in terms of monoid automata. We also show how such automata can be simulated on two stacks, introducing the simultaneous two-stack automaton. We compare different variants of its definition and show their equivalence basing on geometric interpretation of its memory operations.


Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology | 2016

Using longest common subsequence and character models to predict word forms

Alexey Sorokin

This paper presents an algorithm for automatic word forms inflection. We use the method of longest common subsequence to extract abstract paradigms from given pairs of basic and inflected word forms, as well as suffix and prefix features to predict this paradigm automatically. We elaborate this algorithm using combination of affix feature-based and character ngram models, which substantially enhances performance especially for the languages possessing nonlocal phenomena such as vowel harmony. Our system took part in SIGMORPHON 2016 Shared Task and took 3rd place in 17 of 30 subtasks and 4th place in 7 substasks among 7 participants.


developments in language theory | 2014

Pumping Lemma and Ogden Lemma for Displacement Context-Free Grammars

Alexey Sorokin

The pumping lemma and Ogden lemma offer a powerful method to prove that a particular language is not context-free. In 2008 Kanazawa proved an analogue of pumping lemma for well-nested multiple context-free languages. However, the statement of lemma is too weak for practical usage. We prove a stronger variant of pumping lemma and an analogue of Ogden lemma for this language family. We also use these statements to prove that some natural context-sensitive languages cannot be generated by tree-adjoining grammars.


Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing | 2017

Spelling Correction for Morphologically Rich Language: a Case Study of Russian.

Alexey Sorokin

We present an algorithm for automatic correction of spelling errors on the sentence level, which uses noisy channel model and feature-based reranking of hypotheses. Our system is designed for Russian and clearly outperforms the winner of SpellRuEval-2016 competition. We show that language model size has the greatest influence on spelling correction quality. We also experiment with different types of features and show that morphological and semantic information also improves the accuracy of spellchecking.


Journal of Linguistics/Jazykovedný casopis | 2017

Text collections for evaluation of Russian morphological taggers

Olga Lyashevskaya; Victor Bocharov; Alexey Sorokin; Tatiana Shavrina; Dmitry Granovsky; Svetlana Alexeeva

Abstract The paper describes the preparation and development of the text collections within the framework of MorphoRuEval-2017 shared task, an evaluation campaign designed to stimulate development of the automatic morphological processing technologies for Russian. The main challenge for the organizers was to standardize all available Russian corpora with the manually verified high-quality tagging to a single format (Universal Dependencies CONLL-U). The sources of the data were the disambiguated subcorpus of the Russian National Corpus, SynTagRus, OpenCorpora.org data and GICR corpus with the resolved homonymy, all exhibiting different tagsets, rules for lemmatization, pipeline architecture, technical solutions and error systematicity. The collections includes both normative texts (the news and modern literature) and more informal discourse (social media and spoken data), the texts are available under CC BY-NC-SA 3.0 license.


Categories and Types in Logic, Language, and Physics | 2014

Conjoinability in 1-Discontinuous Lambek Calculus

Alexey Sorokin

In the present work we prove a conjoinability criterion for 1-discontinuous Lambek calculus. It turns out that types of this calculus are conjoinable if and only if they have the same sort and the same interpretation in the free abelian group generated by the primitive types.


artificial intelligence and natural language | 2018

Deep Convolutional Networks for Supervised Morpheme Segmentation of Russian Language

Alexey Sorokin; Anastasia Kravtsova

The present paper addresses the task of morphological segmentation for Russian language. We show that deep convolutional neural networks solve this problem with F1-score of 98% over morpheme boundaries and beat existing non-neural approaches.


foundations of computer science | 2016

Ogden Property for Linear Displacement Context-Free Grammars

Alexey Sorokin

It is known that Ogden lemma fails for the class of k-well-nested multiple context-free languages for (k ge 3). In this article we prove a relaxed version of this lemma for linear well-nested MCFLs and show that its statement may be applied to generate counterexamples of linear well-nested MCFLs by the method already existing for the stronger variant.


FG 2014 Proceedings of the 19th International Conference on Formal Grammar - Volume 8612 | 2014

The Conjoinability Relation in Discontinuous Lambek Calculus

Alexey Sorokin

In 2013 Sorokin proved that the criterion of type conjoinability in 1-discontinuous Lambek calculus is the equality of interpretations in the free abelian group generated by primitive types. We extend the method to obtain the analogous result in full discontinuous Lambek calculus. It holds that the criterion is exactly the same as in 1-discontinuous Lambek calculus.

Collaboration


Dive into the Alexey Sorokin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mikhail Burtsev

Moscow Institute of Physics and Technology

View shared research outputs
Top Co-Authors

Avatar

Mikhail Y. Arkhipov

Moscow Institute of Physics and Technology

View shared research outputs
Top Co-Authors

Avatar

Valentin Malykh

Moscow Institute of Physics and Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge