Koraljka Golub | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Koraljka Golub is active.

Explore More

Publication

Featured researches published by Koraljka Golub.

european conference on research and advanced technology for digital libraries | 2005

Importance of HTML structural elements and metadata in automated subject classification

Koraljka Golub; Anders Ardö

The aim of the study was to determine how significance indicators assigned to different Web page elements (internal metadata, title, headings, and main text) influence automated classification. The data collection that was used comprised 1000 Web pages in engineering, to which Engineering Information classes had been manually assigned. The significance indicators were derived using several different methods: (total and partial) precision and recall, semantic distance and multiple regression. It was shown that for best results all the elements have to be included in the classification process. The exact way of combining the significance indicators turned out not to be overly important: using the F1 measure, the best combination of significance indicators yielded no more than 3% higher performance results than the baseline.

Journal of Documentation | 2006

Automated subject classification of textual web documents

Koraljka Golub

Purpose - To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such. Design/methodology/approach - A range of works dealing with automated classification of full-text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages. Findings - Provides major similarities and differences between the three approaches: document pre-processing and utilization of web-specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized. Research limitations/implications - The paper does not attempt to provide an exhaustive bibliography of related resources. Practical implications - As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities. Originality/value - To the authors knowledge, no review paper on automated text classification attempted to discuss more than one communitys approach from an integrated perspective.

The New Review of Hypermedia and Multimedia | 2006

Automated subject classification of textual Web pages, based on a controlled vocabulary: challenges and recommendations.

Koraljka Golub

The primary objective of this study was to identify and address problems of applying a controlled vocabulary in automated subject classification of textual Web pages, in the area of engineering. Web pages have special characteristics such as structural information, but are at the same time rather heterogeneous. The classification approach used comprises string-to-string matching between words in a term list extracted from the Ei (Engineering Information) thesaurus and classification scheme, and words in the text to be classified. Based on a sample of 70 Web pages, a number of problems with the term list are identified. Reasons for those problems are discussed and improvements proposed. Methods for implementing the improvements are also specified, suggesting further research.

Journal of Documentation | 2014

Enhancing social tagging with automated keywords from the Dewey Decimal Classification

Koraljka Golub; Marianne Lykke; Douglas Tudhope

Purpose – The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social tagging, with the ultimate purpose of improving subject indexing and information retrieval. Design/methodology/approach – Over 11,000 Intute metadata records in politics were used. Totally, 28 politics students were each given four tasks, in which a total of 60 resources were tagged in two different configurations, one with uncontrolled social tags only and another with uncontrolled social tags as well as suggestions from a controlled vocabulary. The controlled vocabulary was DDC comprising also mappings from the Library of Congress Subject Headings. Findings – The results demonstrate the importance of controlled vocabulary suggestions for indexing and retrieval: to help produce ideas of which tags to use, to make it easier to find focus for the tagging, to ensure consistency and to increase the number of access points...

acm/ieee joint conference on digital libraries | 2004

Browsing and searching behavior in the Renardus Web service: a study based on log analysis

Traugott Koch; Anders Ardö; Koraljka Golub

Renardus is a distributed Web-based service, which provides integrated searching and browsing access to quality-controlled Web resources. With the overall purpose of improving Renardus, the research aims to study: the detailed usage patterns (quantitative/qualitative, paths through the system); the balance between browsing and searching or mixed activities; typical sequences of usage steps and transition probabilities in a session; typical entry points, referring sites, points of failure and exit points; and, the usage degree of the browsing support features.

Aslib Proceedings | 2010

An evaluation of enhancing social tagging with a knowledge organization system

Brian Matthews; Catherine Jones; Bartłomiej Puzoń; Jim N. J. Moon; Douglas Tudhope; Koraljka Golub; Marianne Lykke Nielsen

The paper investigates the effect on indexing and retrieval when using only social tagging versus when using social tagging in combination with suggestions from a knowledge organization system. The specific context is that of tagging by Web document readers, using Dewey Decimal Classification, its captions, Relative Index Terms and Library of Congress Subject Headings mapped to the captions. The results showed the importance of knowledge organization system suggestions for both indexing and retrieval: to help produce ideas of tags to use, to make it easier to find focus for the tagging, as well as to ensure consistency and increase the number of access points in retrieval.

Cataloging & Classification Quarterly | 2006

Users Browsing Behaviour in a DDC-Based Web Service: A Log Analysis

Traugott Koch; Koraljka Golub; Anders Ardö

SUMMARY This study explores the navigation behaviour of all users of a large web service, Renardus, using web log analysis. Renardus provides integrated searching and browsing access to quality-controlled web resources from major individual subject gateway services. The main navigation feature is subject browsing through the Dewey Decimal Classification (DDC) based on mapping of classes of resources from the distributed gateways to the DDC structure. Among the more surprising results are the hugely dominant share of browsing activities, the good use of browsing support features like the graphical fish-eye overviews, rather long and varied navigation sequences, as well as extensive hierarchical directory-style browsing through the large DDC system.

association for information science and technology | 2016

A framework for evaluating automatic indexing or classification in the context of retrieval

Koraljka Golub; Dagobert Soergel; George Buchanan; Douglas Tudhope; Marianne Lykke; Debra Hiom

Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real‐life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single “gold standard” method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer‐assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.

Journal of Documentation | 2009

Automated classification of Web pages in hierarchical browsing

Koraljka Golub; Marianne Lykke

Purpose – The purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven useful, to investigate the performance of an automated classification algorithm based on the Ei classification scheme.Design/methodology/approach – A user study was conducted in which users solved four controlled searching tasks. The users browsed the Ei classification scheme in order to examine the suitability of the classification systems for browsing. The classification algorithm was evaluated by the users who judged the correctness of the automatically assigned classes.Findings – The study showed that the Ei classification scheme is suited for browsing. Automatically assigned classes were on average partly correct, with some classes working better than others. Success of browsing showed to be correlated and dependent on classification correctness.Research limitations/implications – Further research should address problems of d...

IFLA Journal | 2016

Research data services: An exploration of requirements at two Swedish universities

Monica Lassi; Maria Johnsson; Koraljka Golub

The paper reports on an exploratory study of researchers’ needs for effective research data management at two Swedish universities, conducted in order to inform the ongoing development of research data services. Twelve researchers from diverse fields have been interviewed, including biology, cultural studies, economics, environmental studies, geography, history, linguistics, media and psychology. The interviews were structured, guided by the Data Curation Profiles Toolkit developed at Purdue University, with added questions regarding subject metadata. The preliminary analysis indicates that the research data management practices vary greatly among the respondents, and therefore so do the implications for research data services. The added questions on subject metadata indicate needs of services guiding researchers in describing their datasets with adequate metadata.

Explore More