Włodzimierz Lewoniewski
Poznań University of Economics
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Włodzimierz Lewoniewski.
business information systems | 2015
Krzysztof Węcel; Włodzimierz Lewoniewski
Quality of data in DBpedia depends on underlying information provided in Wikipedia’s infoboxes. Various language editions can provide different information about given subject with respect to set of attributes and values of these attributes. Our research question is which language editions provide correct values for each attribute so that data fusion can be carried out. Initial experiments proved that quality of attributes is correlated with the overall quality of the Wikipedia article providing them. Wikipedia offers functionality to assign a quality class to an article but unfortunately majority of articles have not been graded by community or grades are not reliable. In this paper we analyse the features and models that can be used to evaluate the quality of articles, providing foundation for the relative quality assessment of infobox’s attributes, with the purpose to improve the quality of DBpedia.
international conference on information and software technologies | 2016
Włodzimierz Lewoniewski; Krzysztof Węcel; Witold Abramowicz
This article aims to analyse the importance of the Wikipedia articles in different languages (English, French, Russian, Polish) and the impact of the importance on the quality of articles. Based on the analysis of literature and our own experience we collected measures related to articles, specifying various aspects of quality that will be used to build the models of articles’ importance. For each language version, the influential parameters are selected that may allow automatic assessment of the validity of the article. Links between articles in different languages offer opportunities in terms of comparison and verification of the quality of information provided by various Wikipedia communities. Therefore, the model can be used not only for a relative assessment of the content of the whole article, but also for a relative assessment of the quality of data contained in their structural parts, the so-called infoboxes.
business information systems | 2017
Nina Khairova; Włodzimierz Lewoniewski; Krzysztof Węcel
We present the method of estimating the quality of articles in Russian Wikipedia that is based on counting the number of facts in the article. For calculating the number of facts we use our logical-linguistic model of fact extraction. Basic mathematical means of the model are logical-algebraic equations of the finite predicates algebra. The model allows extracting of simple and complex types of facts in Russian sentences. We experimentally compare the effect of the density of these types of facts on the quality of articles in Russian Wikipedia. Better articles tend to have a higher density of facts.
international conference on information and software technologies | 2017
Włodzimierz Lewoniewski; Krzysztof Węcel; Witold Abramowicz
Reliable information sources are important to assess content quality in Wikipedia. Using references readers can verify facts or find more details about described topic. Each Wikipedia article can have over 290 language versions. As articles can be edited independently in any language, even by anonymous users, the information about the same topic may be inconsistent. This also applies to sources that can be found in various language versions of particular article, so the same statement can have different sources. In some cases, Wikipedia users, which speak two or more languages, can transfer information with references between language versions. This paper presents an analysis of using common references in over 10 million articles in several Wikipedia language editions: English, German, French, Russian, Polish, Ukrainian, Belarussian. Also, the study shows the use of similar sources and their number in language sensitive topics.
Informatics | 2017
Włodzimierz Lewoniewski; Krzysztof Węcel; Witold Abramowicz
Despite the fact that Wikipedia is often criticized for its poor quality, it continues to be one of the most popular knowledge bases in the world. Articles in this free encyclopedia on various topics can be created and edited in about 300 different language versions independently. Our research has showed that in language sensitive topics, the quality of information can be relatively better in the relevant language versions. However, in most cases, it is difficult for the Wikipedia readers to determine the language affiliation of the described subject. Additionally, each language edition of Wikipedia can have own rules in the manual assessing of the content’s quality. There are also differences in grading schemes between language versions: some use a 6–8 grade system to assess articles, and some are limited to 2–3. This makes automatic quality comparison of articles between various languages a challenging task, particularly if we take into account a large number of unassessed articles; some of the Wikipedia language editions have over 99% of articles without a quality grade. The paper presents the results of a relative quality and popularity assessment of over 28 million articles in 44 selected language versions. Comparative analysis of the quality and the popularity of articles in popular topics was also conducted. Additionally, the correlation between quality and popularity of Wikipedia articles of selected topics in various languages was investigated. The proposed method allows us to find articles with information of better quality that can be used to automatically enrich other language editions of Wikipedia.
business information systems | 2017
Włodzimierz Lewoniewski; Krzysztof Węcel
Online encyclopedia Wikipedia is one of the most popular sources of knowledge. It is often criticized for poor information quality. Articles can be created and edited even by anonymous users independently in almost 300 languages. Therefore, a difference in the information quality in various language versions on the same topic is observed. The Wikipedia community has created a system for assessing the quality of articles, which can be helpful in deciding which language version is more complete and correct. There are several issues: each Wikipedia language can use own grading scheme and there is usually a large number of unevaluated articles. In this paper, we propose to use a synthetic measure for automatic quality evaluation of the articles in different languages based on important features.
business information systems | 2017
Włodzimierz Lewoniewski
Despite the fact that Wikipedia is one of the most popular sources of information in the world, it is often criticized for the poor quality of content. In this online encyclopaedia articles on the same topic can be created and edited independently in different languages. Some of this language versions can provide valuable information on a specific topics. Wikipedia articles may include infobox, which used to collect and present a subset of important information about its subject. This study presents method for quality assessment of Wikipedia articles and information contained in their infoboxes. Choosing the best language versions of a particular article will allow for enrichment of information in less developed version editions of particular articles.
international conference on information and software technologies | 2017
Włodzimierz Lewoniewski; Nina Khairova; Krzysztof Węcel; Nataliia Stratiienko; Witold Abramowicz
Nowadays, the assessment of the quality and credibility of Wikipedia articles becomes increasingly important. We propose to use morphological and semantic features to estimate the quality of Wikipedia articles in Russian language. We distinguished over 150 linguistic features and divided them into four groups. In these groups, we considered the features of encyclopedic style, readability and subjectivism of the article’s text. Based on Random Forest as a classification algorithm, we show the most importance linguistic features that affect the quality of Russian Wikipedia articles. We compare the classification results of our four linguistic features groups separately. We have achieved the F-measure of 89,75%.
international conference on information and software technologies | 2018
Włodzimierz Lewoniewski; Krzysztof Węcel; Witold Abramowicz
Wikipedia is the most popular and the largest user-generated source of knowledge on the Web. Quality of the information in this encyclopedia is often questioned. Therefore, Wikipedians have developed an award system for high quality articles, which follows the specific style guidelines. Nevertheless, more than 1.2 million articles in Polish Wikipedia are unassessed. This paper considers over 100 linguistic features to determine the quality of Wikipedia articles in Polish language. We evaluate our models on 500 000 articles of Polish Wikipedia. Additionally, we discuss the importance of linguistic features for quality prediction.
international conference on information and software technologies | 2018
Włodzimierz Lewoniewski; Ralf-Christian Härting; Krzysztof Węcel; Christopher Reichstein; Witold Abramowicz
The leading online encyclopedia Wikipedia is struggling with inconsistent article quality caused by the collaborative editing model. While one can find many helpful articles with consistent information on Wikipedia, there are also a lot of questionable articles with unclear or unfinished information yet. The quality of each article may vary over time as different users repeatedly re-edit content. One of the most important elements of the Wikipedia articles are references which allow to verify content and to show its source to user. Based on the fact that most of these references are web pages, it is possible to get more information about their quality by using citation analysis tools. For science and practice the empirical proof of the quality of the articles in Wikipedia could have a further signal effect, as the citation of Wikipedia articles, especially in scientific practice, is not yet recognised. This paper presents general results of Wikipedia analysis using metrics from the Toolbox SISTRIX, which is one of the leading providers of indicators for Search Engine Optimization (SEO). In addition to the preliminary analysis of the Wikipedia articles as separate web pages, we extracted data from more than 30 million references in different language versions of Wikipedia and analyzed over 180 thousand most popular hosts. In addition, we compared the same sources from different geographical perspectives using country-specific visibility indices.