Šandor Dembitz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Šandor Dembitz is active.

Explore More

Publication

Featured researches published by Šandor Dembitz.

Software - Practice and Experience | 2011

Advantages of online spellchecking: a Croatian example

Šandor Dembitz; Mirko Randić; Gordan Gledec

Online spellchecking is commonly regarded as an auxiliary way of performing spellchecking. However, it offers a unique opportunity to constantly improve spellchecker linguistic functionality through interaction with the community of spellchecker users. Such a possibility is crucial for spellchecking in non‐central and under‐resourced languages, in order to overcome gaps in NLP tools between them and central languages. The paper describes Hascheck, a Croatian online spellchecker able to learn words from texts it receives. It started as the first Croatian spellchecker, hence as a basic NLP tool for an under‐resourced language, but due to its learning ability it demonstrates linguistic functionality comparable to that of conventional central‐language spellcheckers. Based on these experiences we also discuss the future of online spellchecking in the context of global NLP tasks. Copyright

agent and multi agent systems technologies and applications | 2012

Informativeness of inflective noun bigrams in croatian

Damir Jurić; Marko Banek; Šandor Dembitz

A feature of Croatian and other Slavic languages is a rich inflection system, which does not exist in English and other languages that traditionally dominate the scientific focus of computational linguistics. In this paper we present the results of the experiments conducted on the corpus of the Croatian online spellchecker Hascheck, which point to using non-nominative cases for discovering collocations between two nouns, specifically the first name and the family name of a person. We analyzed the frequencies and conditional probabilities of the morphemes corresponding to Croatian cases and quantified the level of attraction between two words using the normalized pointwise mutual information measure. Two components of a personal name are more likely to co-occur in any of the non-nominative cases than in nominative. Furthermore, given a component of a personal name, the conditional probability that it is accompanied with the other component of the name are higher for the genitive/accusative and instrumental case than for nominative.

mediterranean electrotechnical conference | 2000

Model checking of concurrent system with SDL/sup --/ specification

Bruno Blašković; Šandor Dembitz; Petar Knezevic

It is well known that the best results regarding concurrent system design are obtained when design errors are found in the earliest possible phase. For that purpose system specification is verified through model checking. We try to hide, as much as possible, the model checking formalism from the designers viewpoint. First, a system is modeled as a set of processes described formally as an extended finite state machine within the SDL/sup --/ language. Such a description is translated into the model checker, SPIN, where the desired properties are verified. Special attention is given to the possibility of modeling various types of transitions and to a definition of the tool where model checking is performed. With such an approach the designer can have the, SDL/sup --/ specification verified against the desired properties.

Archive | 1999

Hascheck - The Croatian Academic Spelling Checker

Šandor Dembitz; Petar Knezevic; Mladen Sokele

The Croatian Academic Spelling Checker, or Hascheck, is a telematic service embedded in E-mail. The user sends his/her text to an address and waits for an automatic reply in the form of a Hascheck report. As a program, Hascheck is a learning semiautomaton. First, it evaluates unrecognised strings from a text in a fuzzy manner: some of them are extremely peculiar, others are very or moderately peculiar, and the rest are almost non-peculiar strings, i.e. almost certainly words. Then, less peculiar strings are processed by a tagger. Last, after a minor human intervention, a collection of words to be learned is obtained. In this paper we describe in short the string classifying algorithm and its selectivity. We also describe the tagging algorithm and its efficiency. Experience gained during four years of service operation, accomplished with two analytic functions describing the learning process, are also presented. Finally, we discuss project costs and benefits.

mediterranean electrotechnical conference | 1998

Computational proofreading of the Croatian lexicon

Šandor Dembitz; Mladen Sokele

The design of a spelling checker for a highly inflected language is commonly regarded as a difficult problem. We present an approach to this problem, which is mainly statistically based. The approach was tested on the Croatian language. An unconventional spelling checking tool was developed. The results obtained by performing the most demanding task for any spelling checker, the proofreading of a huge lexicon, point out that this approach could be applicable to many languages.

Procedia Computer Science | 2014

An economic approach to big data in a minority language

Šandor Dembitz; Gordan Gledec; Mladen Sokele

Googles n-gram project brought recently big data benefits to several main world languages, like English, Chinese etc. Any attempt to derive such systems, aimed to accelerate the development of NLP applications for world minority languages, in the manner in which it has been done in the project, encounters many obstacles. This paper presents an innovative and economic approach to large-scale n-gram system creation applied to the Croatian language case. Instead of using the Web as the worlds biggest text repository, our process of n-gram collection relies on the Croatian academic online spellchecker Hascheck, a language service publicly available since 1993 and popular worldwide. The service has already processed a corpus whose size exceeds the size of the Croatian web-corpus created in recent years. Contrary to the Google n-gram systems, where cutoff criteria were applied, our n-gram filtering is based on dictionary criteria. This resulted in a system comparable in size to the largest n-gram systems of today. Because of the reliance on a service in constant use, the Croatian n-gram system is a dynamic one, unique among the systems compared. The importance of having an n-gram infrastructure for rapid breakthroughs in new application areas is also exemplified in the paper.

international conference on knowledge based and intelligent information and engineering systems | 2010

Architecture of Hascheck: an intelligent spellchecker for croatian language

Šandor Dembitz; Gordan Gledec; Bruno Blašković

The design and development of a spellchecker for highly inflected languages is commonly regarded as a challenging task. In this paper we present the architecture of Hascheck, a spellchecking system developed for Croatian language. We describe functional elements that make it an intelligent system and discuss specific issues related to Haschecks dictionary size as well as its guessing and learning capabilities.

mediterranean electrotechnical conference | 2000

JERS radar data in environment pollution monitoring-Zagreb example

Goran Hudec; Šandor Dembitz

Jakusevac dumpsite is located along the right bank of the Sava river near Zagreb. As modern methods for garbage recycling have not yet been implemented, Jakusevac is at the moment one of the largest open dump locations in Europe. It is a place of great concern in regard to environment pollution. To obtain information about potential pollution outflows, remote sensing methods have been implemented. The primary concern in pollution prevention in urban areas is water flow protection. Beside open water outflows, there are underground ones that could be detected only by the increase in surface soil moisture. JERS radar data have been processed. With a spatial resolution of 12.5 meters and good water content detection capability of radar data, it has been chosen as a method for environment monitoring. The processed data have shown that the endangered area is much larger than previously suspected. As a result of this, researching new areas for ground control points has been recommended, as well as a continuous remote sensing environment monitoring.

computer and information technology | 2003