Krisztián Monostori | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Krisztián Monostori is active.

Explore More

Publication

Featured researches published by Krisztián Monostori.

acm international conference on digital libraries | 2000

Document overlap detection system for distributed digital libraries

Krisztián Monostori; Arkady B. Zaslavsky; Heinz W. Schmidt

In this paper we introduce the MatchDetectReveal(MDR) system, which is capable of identifying overlapping and plagiarised documents. Each component of the system is briefly described. The matching-engine component uses a modified suffix tree representation, which is able to identify the exact overlapping chunks and its performance is also presented.

international conference on computational science | 2002

Comparison of Overlap Detection Techniques

Krisztián Monostori; Raphael A. Finkel; Arkady B. Zaslavsky; Gábor Hodász; Máté Pataki

Easy access to the World Wide Web has raised concerns about copyright issues and plagiarism. It is easy to copy someone elses work and submit it as someones own. This problem has been targeted by many systems, which use very similar approaches. These approaches are compared in this paper and suggestions are made when different strategies are more applicable than others. Some alternative approaches are proposed that perform better than previously presented methods. These previous methods share two common stages: chunking of documents and selection of representative chunks. We study both stages and also propose alternatives that are better in terms of accuracy and space requirement. The applications of these methods are not limited to plagiarism detection but may target other copy-detection problems. We also propose a third stage to be applied in the comparison that uses suffix trees and suffix vectors to identify the overlapping chunks.

european conference on research and advanced technology for digital libraries | 2001

Using Copy-Detection and Text Comparison Algorithms for Cross-Referencing Multiple Editions of Literary Works

Arkady B. Zaslavsky; Alejandro Bia; Krisztián Monostori

This article describes a joint research work between Monash University and the University of Alicante, where software originally meant for plagiarisman d copy detection in academic works is successfully applied to performcom parative analysis of different editions of literary works. The experiments were performed with Spanish texts from the Miguel de Cervantes digital library. The results have proved useful for literary and linguistic research, automating part of the tedious task of comparative text analysis. Besides, other interesting uses were detected.

Proceedings 24th Australian Computer Science Conference. ACSC 2001 | 2001

Efficiency of data structures for detecting overlaps in digital documents

Krisztián Monostori; Arkady B. Zaslavsky; Heinz W. Schmidt

This paper analyses the efficiency of different data structures for detecting overlap in digital documents. Most existing approaches use some hash function to reduce the space requirements for their indices of chunks. Since a hash function can produce the same value for different chunks, false matches are possible. In this paper we propose an algorithm that can be used for eliminating those false matches. This algorithm uses a suffix tree structure, which is space consuming. We define a modified suffix tree that only considers chunks starting at the beginning of words and we show how the algorithm can work on this structure. We can alternatively reduce space requirements of a suffix tree by converting it to a directed acyclic graph. We show that suffix link information can be preserved in this new structure and the matching statistics algorithm still works with those modifications that we propose.

international symposium on algorithms and computation | 2001

Suffix Vector: A Space-Efficient Suffix Tree Representation

Krisztián Monostori; Arkady B. Zaslavsky; István Vajk

This paper introduces a new way of representing suffix trees. The basic idea behind the representation is that we are storing the nodes of the tree along with the string itself, thus edge labels can directly be read from the string. The new representation occupies less space than the best-known representation to date in case of English text and program files, though it requires slightly more space in case of DNA sequences. We also believe that our representation is clearer and thus implementing algorithms on it is easier. We also show that our representation is not only better in terms of space but it is also faster to retrieve information from the tree. We theoretically compare the running time of the matching statistics algorithm on both representations.

ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4 | 2002

Suffix vector: space- and time-efficient alternative to suffix trees

Krisztián Monostori; Arkady B. Zaslavsky; Heinz W. Schmidt

Suffix trees are versatile data structures that are used for solving many string-matching problems. One of the main arguments against widespread usage of the structure is its space requirement. This paper describes a new structure called suffix vector, which is not only better in terms of storage space but also simpler than the most efficient suffix tree representation known to date. Alternatives of storage representations are discussed and a linear-time construction algorithm is also proposed in this paper. Space requirement of the suffix vector structure is compared to the space requirement of alternative suffix tree representations. We also make a theoretical comparison on the number of operations required to run algorithms on the suffix vector.

parallel computing | 2000

Parallel and Distributed Document Overlap Detection on the Web

Krisztián Monostori; Arkady B. Zaslavsky; Heinz W. Schmidt

Proliferation of digital libraries plus availability of electronic documents from the Internet have created new challenges for computer science researchers and professionals. Documents are easily copied and redistributed or used to create plagiarised assignments and conference papers. This paper presents a new, two-stage approach for identifying overlapping documents. The first stage is identifying a set of candidate documents that are compared in the second stage using a matching-engine. The algorithm of the matching-engine is based on suffix trees and it modifies the known matching statistics algorithm. Parallel and distributed approaches are discussed at both stages and performance results are presented.

AICPS | 2002