Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Simon Dobnik is active.

Publication


Featured researches published by Simon Dobnik.


Proceedings of the Third Arabic Natural Language Processing Workshop | 2017

Identification of Languages in Algerian Arabic Multilingual Documents.

Wafia Adouane; Simon Dobnik

This paper presents a language identification system designed to detect the language of each word, in its context, in a multilingual documents as generated in social media by bilingual/multilingual communities, in our case speakers of Algerian Arabic. We frame the task as a sequence tagging problem and use supervised machine learning with standard methods like HMM and Ngram classification tagging. We also experiment with a lexicon-based method. Combining all the methods in a fall-back mechanism and introducing some linguistic rules, to deal with unseen tokens and ambiguous words, gives an overall accuracy of 93.14%. Finally, we introduced rules for language identification from sequences of recognised words.


Archive | 2016

A Model for Attention-Driven Judgements in Type Theory with Records

Simon Dobnik; John D. Kelleher

This paper makes three contributions to the discussion on the applicability of Type Theory with Records (TTR) to embodied dialogue agents. First, it highlights the problem of type assignment or judgements in practical implementations which is resource intensive. Second, it presents a judgement control mechanism, which consists of grouping of types into clusters or states by their thematic relations and selection of types following two mechanisms inspired by the Load Theory of selective attention and cognitive control (Lavie et al., 2004), that addresses this problem. Third, it presents a computational framework, based on Bayesian inference, that offers a basis for future practical experimentation on the feasibility of the proposed approach. 1 Type Theory with Records One of the central challenges for multi-modal dialogue systems is information fusion or how such a system can represent information from different domains, compare it, compose it, and reason about it. Typically, a situated agent will have to deal with information that comes from its perceptual sensors and will be represented as real-valued vectors and conceptual categories (some of which correspond to words in language) that are formed through cognitive processes in the brain. When situated agents are implemented practically one typically adopts a layered approach starting at the scene geometry and finishing at the level of the agent’s knowledge about the objects and their interactions (Kruijff et al., 2006). Although, this approach may be good for practical reasons, for example there are pre-existing systems which may be organised in a ⇤ Both authors contributed equally. pipeline this way, this also assumes that representations and operations are distinct at each level and one needs to design interfaces that would mediate between these levels. Type Theory with Records (TTR) (Cooper, 2005; Cooper, 2012; Cooper et al., 2015) provides a theory of natural language semantics which views meaning and reference assignment being in the domain of an individual agent who can make judgements about situations (or invariances in the world) of being of types (written as a : T ). The type inventory of an agent is not static but is continuously refined through agent’s interaction with its physical environment and with other agents through dialogue interaction which provides instances and feedback on what strategies to adopt to learn from these instances. The reason why agent’s meaning representations or type inventories converge to an approximately identical inventory is that agents are situated in the identical or sufficiently similar physical environment and have grounded conversations with other agents; see for example the work of (Steels and Belpaeme, 2005) and (Larsson, 2013) for an approach in TTR. Having the capability to adjust the type representations they can adapt to new physical environments and new conversational exchanges. Such view is not novel to mobile robotics (Dissanayake et al., 2001) nor to approaches to semantic and pragmatics of dialogue (Clark, 1996), but it is novel to formal semantics (Dowty et al., 1981; Blackburn and Bos, 2005) which represents important body of work on how meaning is constructed compositionally and reasoned about. Overall, we see TTR as a highly fitting framework for modelling cognitive situated agents as it connects perception and high level semantics of natural language and vice versa. The type system in TTR is rich in comparison to that found in traditional formal semantics (entities, truth values and function types constructed from these and other function types). In addition types Appears in JerSem: The 20th Workshop on the Semantics and Pragmatics of Dialogue, edited by Julie Hunter, Mandy Simons and Matthew Stone. New Brunswick, NJ USA, July 16-18, 2016, pages 25-34. are used to model meaning in a proof-theoretic way rather than constraining model theoretic interpretation. Types in TTR can be either basic types such as Ind or Real or record types. Record types are represented as matrices containing labelvalue pairs where labels are constants and values can be either basic types, ptypes which act as type constructors and record types. The corresponding proof-objects of record types are records. These may be thought of as iconic representations of (Harnad, 1990) or sensory readings that an agent perceives as sensory projections of objects or situations in the world. The example below shows a judgement that a record (a matrix with = as a delimiter) containing a sensory reading is of a type (with : as a delimiter). The traditional distinction between symbolic and sub-symbolic knowledge is not maintained in this framework as both can be assigned appropriate types. 2 4 a = ind26 sr = [[34,24],[56,78]. . . ] loc = [45,78,0.34] 3


international conference on computational linguistics | 2014

Exploration of functional semantics of prepositions from corpora of descriptions of visual scenes

Simon Dobnik; John D. Kelleher

We present a method of extracting functional semantic knowledge from corpora of descriptions of visual scenes. Such knowledge is required for interpretation and generation of spatial descriptions in tasks such as visual search. We identify semantic classes of target and landmark objects related by each preposition by abstracting over WordNet taxonomy. The inclusion of such knowledge in visual search should equip robots with a better, more human-like spatial cognition.


Archive | 2017

Back to the Future: Logic and Machine Learning

Simon Dobnik; John D. Kelleher

In this paper we argue that since the beginning of the natural language processing or computational linguistics there has been a strong connection between logic and machine learning. First of all, there is something logical about language or linguistic about logic. Secondly, we argue that rather than distinguishing between logic and machine learning, a more useful distinction is between top-down approaches and datadriven approaches. Examining some recent approaches in deep learning we argue that they incorporate both properties and this is the reason for their very successful adoption to solve several problems within language technology.


Journal of Language Modelling | 2017

Interfacing language, spatial perception and cognition in Type Theory with Records

Simon Dobnik; Robin Cooper

We argue that computational modelling of perception, action, language, and cognition introduces several requirements on a formal semantic theory and its practical implementations. Using examples of semantic representations of spatial descriptions we show how Type Theory with Records (TTR) satisfies these requirements.


Archive | 2016

Towards a Computational Model of Frame of Reference Alignment in Swedish Dialogue

Simon Dobnik; Christine Howes; Kim Demaret; John D. Kelleher

In this paper we examine how people negotiate, interpret and repair the frame of reference (FoR) in online text based dialogues discussing spatial scenes in Swedish. We describe work-in-progress in which participants are given different perspectives of the same scene and asked to locate several objects that are only shown on one of their pictures. This task requires participants to coordinate on FoR in order to identify the missing objects. This study has implications for situated dialogue systems.


Archive | 2017

What is not where: the challenge of integrating spatial representations into deep learning architectures

John D. Kelleher; Simon Dobnik


meeting of the association for computational linguistics | 2018

Improving Neural Network Performance by Injecting Background Knowledge: Detecting Code-switching and Borrowing in Algerian texts.

Wafia Adouane; Jean-Philippe Bernardy; Simon Dobnik


language resources and evaluation | 2018

Shami: A Corpus of Levantine Arabic Dialects.

Chatrine Qwaider; Motaz Saad; Stergios Chatzikyriakidis; Simon Dobnik


Proceedings of the Second Workshop on Subword/Character LEvel Models | 2018

A Comparison of Character Neural Language Model and Bootstrapping for Language Identification in Multilingual Noisy Texts

Wafia Adouane; Simon Dobnik; Jean-Philippe Bernardy; Nasredine Semmar

Collaboration


Dive into the Simon Dobnik's collaboration.

Top Co-Authors

Avatar

John D. Kelleher

Dublin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Wafia Adouane

University of Gothenburg

View shared research outputs
Top Co-Authors

Avatar

Jean-Philippe Bernardy

Chalmers University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Robin Cooper

University of Gothenburg

View shared research outputs
Researchain Logo
Decentralizing Knowledge