Jozef Kapusta
University of Constantine the Philosopher
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jozef Kapusta.
international conference on conceptual structures | 2010
Michal Munk; Jozef Kapusta; Peter Švec
Abstract Presumptions of each data analysis are data themselves, regardless of the analysis focus (visit rate analysis, optimization of portal, personalization of portal, etc.). Results of selected analysis highly depend on the quality of analyzed data. In case of portal usage analysis, these data can be obtained by monitoring web server log file. We are able to create data matrices and web map based on these data which will serve for searching for behaviour patterns of users. Data preparation from the log file represents the most time-consuming phase of whole analysis. We realized an experiment so that we can find out to which criteria are necessary to realize this time-consuming data preparation. We aimed at specifying the inevitable steps that are required for obtaining valid data from the log file. Specially, we focused on the reconstruction of activities of the web visitor. This advanced technique of data preprocessing belongs to time consuming one. In the article we tried to assess the impact of reconstruction of activities of a web visitor on the quantity and quality of the extracted rules which represent the web users’ behaviour patterns.
Procedia Computer Science | 2011
Michal Munk; Marta Vrábelová; Jozef Kapusta
Abstract The analysis of behavior of portal visitors is one of the most important parts of web portal optimization. The results of the analysis are important for the further correction and improvement of web part organization. The aim of the paper is modeling of probabilities‘ accesses to the categories of web parts of portal. We deal with the access probabilities to the individual categories of faculty portal content depending on the day’s hour and the week’s day. The probabilities are estimated using multinomial logit model for employees and students separately. In logit models, in case of students and employees, the week’s days present statistically significant signs, representing dummy variables (MON, TUE,…) in the model. On the other hand, day’s hours representing with variables HOUR_DAY and their square HOUR_DAY_Q, are shown as statistically significant signs only in the case of students. These results correspond with the computing probabilities wherein the probabilities of access to web parts of the portal are more stable in the case of employees than of students during the day. The analysis provided us several interesting and surprising results. For instance, from the analysis, results follow that the part study is the most visited part by students in the evening and night hours. The analysis results confirmed general trends, for example the part announcements is the most visited part in morning’s hours, at the beginning of the week especially. All of the analysis results will help us to further optimize our web portal. This is especially point in level of portal adaptivity on the basis user and access hour on portal.
international conference on computational collective intelligence | 2014
Jozef Kapusta; Michal Munk; Martin Drlík
The paper introduces an alternative method for website analysis that combines two web mining research fields - discovering of web users’ behaviour patterns as well as discovering knowledge from the website structure. The main objective of the paper is to identify the web pages, in which the value of importance of these web pages, estimated by the website developers, does not correspond to the actual perception of these web pages by the visitors. The paper presents a case study, which used the proposed method of the identification suspicious web pages using the analysis of expected and observed probabilities of accesses to the web pages. The expected probabilities were calculated using the PageRank method and observed probabilities were obtained from the web server log file. The observed and expected data were compared using the residual analysis. The obtained results can be successfully used for the identification of potential problems with the structure of the observed website.
international conference on conceptual structures | 2013
Michal Munk; Anna Pilková; Jozef Kapusta; Peter Švec; Martin Drlík
Abstract The paper analyses domestic and foreign market participants’ interests in mandatory Basel 2, Pillar 3 information disclosure of a commercial bank during the recent financial crisis. The authors try to ascertain whether the purposes of Basel 2 regulations under the Pillar 3 - Market discipline, publishing the financial and risk related information, have been fulfilled. Therefore, the paper focuses on modelling of visitors’ behaviour at the commercial bank website where information according to Basel 2 is available. The authors present a detailed analysis of the user log data stored by web servers. The analysis can help better understand the rate of use of the mandatory and optional Pillar 3 information disclosure web pages at the commercial bank website in the recent financial crisis in Slovakia. The authors used association rule analysis to identify the association among content categories of the website. The results show that there is in general a small interest of stakeholders in mandating the commercial banks disclosure of financial information. Foreign website visitors were more concerned about information disclosure according to Pillar 3, Basel 2 regulation, and they have less interest in general information about the bank than domestic ones.
international conference on interactive collaborative learning | 2011
Martin Cápay; Jozef Kapusta; Martin Magdin; Miroslava Mesárošová; Peter Švec
In the paper, we describe one-day project aimed to popularization of scientific fields carried out by eight departments of the Faculty of Natural Sciences, Constantine the Philosopher University in Nitra. The project was named Scientific Fair - Science you can see, hear and experience. Its main goal was to present seven scientific fields. Popularization was realized as experimental activities. Their aim was to inspire the audience, arouse their interest in science and motivate the participants to cognitive activities. We introduce the idea of the project in detail. We deal with the marketing as well as the content point of view concentrating mainly on informatics realized by the Department of Informatics.
trans. computational collective intelligence | 2015
Jozef Kapusta; Michal Munk; Martin Drlík
The paper describes an alternative method of website analysis and optimization that combines methods of web usage and web structure mining - discovering of web users’ behaviour patterns as well as discovering knowledge from the website structure. Its primary objective is identifying of web pages, in which the value of their importance, estimated by the website developers, does not correspond to the real behaviour of the website visitors. It was proved before that the expected visit rate correlate with the observed visit rate of the web pages. Consequently, the expected probabilities of visiting of web pages by a visitor were calculated using the PageRank method and observed probabilities were obtained from the web server log files using the web usage mining method. The observed and expected probabilities were compared using the residual analysis. While the sequence rules analysis can only uncover the potential problem of web pages with higher visit rate, the proposed method of residual analysis can also consider other web pages with a smaller visit rate. The obtained results can be successfully used for a website optimization and restructuring, improving website navigation, and adaptive website realisation.
Archive | 2018
Jozef Kapusta; Michal Munk; Peter Švec
We describe various approaches how to calculate the value of PageRank in this paper. There are few methods how to calculate the PageRank, from the basic historical one to more enhanced versions. Most of them are using the original value of the damping factor. We describe the experiment we realised using our method for analysing differences between expected and observed probability of accesses to web pages of the selected portal. We used five slightly different methods for PageRank estimation using both the original value of damping factor and the value calculated from data in the web server log file. We assumed and confirmed that the estimation/calculation of the damping factor would have a significant impact on the estimation of the PageRank. We also wrongly assumed that the estimation/calculation of the damping factor would have a significant impact on the number of suspicious pages. We also compared the computational complexity of used PageRank methods, and the most effective method seems to be a method with the estimated value of the damping factor.
International Conference on Applied Physics, System Science and Computers | 2017
Dominik Halvoník; Jozef Kapusta
When analyzing students’ behavior, it is possible to use a variety of web mining techniques and techniques in addition to basic descriptive statistics. These techniques are applied in order to identify the most frequent way how student is exploring courses, frequent problems in passing the e-learning course, identifying problems in individual tests or identification problematic parts of educational materials. The aim of this paper is to introduce our own methodology for identifying problematic parts of educational content. We use two areas of mining web: web content and web usage mining. By applying basic web content mining techniques, we created a site size metrics for the training course. We experimented with time spent on educational pages. The content size of the individual training course parts will predict the time spent by students in their study. We analyzed the dependence between number of words of the web page with the educational content and the time spent by students on this site. We deal with the problematic parts of the analysis differences in content length of educational materials and time spent on them in discussion and also in the conclusion of this paper. We also assume other metrics for comparison of these variables.
International Conference on Applied Physics, System Science and Computers | 2017
Jozef Kapusta; Ľubomír Benko
This paper is focused on improving the output of post-edited Machine Translation. A novel recommender system is introduced in this paper that was created to help post-editors to correct translation created by the Machine Translation. The aim of the paper is to describe the design and functionality of the proposed system. With the usage of automated parser were analysed pairs of segments from Machine Translation and corresponding post-edition. The calculation of the likelihood of the recommendation was used to get the word with the highest probability that was selected based on the similarity in words, tags and lemmas. The introduced approach can help to create a versatile recommender system that helps post-editors to improve their translation.
advanced industrial conference on telecommunications | 2016
Daša Munková; Michal Munk; Jozef Kapusta; Jaroslav Reichel
Objective of the paper is to evaluate metrics of automatic evaluation of machine translation output using manual metrics — fluency and adequacy. We tried to answer the question to which extent the manual evaluation correlates with the automatic evaluation of MT output from/to Slovak to/from English. We focused on metrics based on the similarity and statistical principles (WER, PER, CDER and BLEU-n). We found out, that the manual evaluation, namely fluency and adequacy metrics correlates with automatic metrics of MT evaluation for less spoken language and low resource language such as Slovak. The contribution also consists of system proposal for both, manual (based on POS tagging) and automatic (based on reference) evaluation of MT output.