Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yunhyong Kim is active.

Publication


Featured researches published by Yunhyong Kim.


Information Processing and Management | 2012

Automatically structuring domain knowledge from text: An overview of current research

Malcolm Clark; Yunhyong Kim; Udo Kruschwitz; Dawei Song; Dyaa Albakour; Stephen Dignum; Ulises Cerviño Beresi; Maria Fasli; Anne N. De Roeck

This paper presents an overview of automatic methods for building domain knowledge structures (domain models) from text collections. Applications of domain models have a long history within knowledge engineering and artificial intelligence. In the last couple of decades they have surfaced noticeably as a useful tool within natural language processing, information retrieval and semantic web technology. Inspired by the ubiquitous propagation of domain model structures that are emerging in several research disciplines, we give an overview of the current research landscape and some techniques and approaches. We will also discuss trade-offs between different approaches and point to some recent trends.


hawaii international conference on system sciences | 2008

Examining Variations of Prominent Features in Genre Classification

Yunhyong Kim; Seamus Ross

This paper investigates the correlation between features of three types (visual, stylistic and topical types) and genre classes. The majority of previous studies in automated genre classification have created models based on an amalgamated representation of a document using a combination of features. In these models, the inseparable roles of different features make it difficult to determine a means of improving the classifier when it exhibits poor performance in detecting selected genres. In this paper we use classifiers independently modeled on three groups of features to examine six genre classes to show that the strongest features for making one classification is not necessarily the best features for carrying out another classification.


International Journal of Digital Curation | 2008

“The Naming of Cats”: Automated Genre Classification

Yunhyong Kim; Seamus Ross

This paper builds on the work presented at the ECDL 2006 in automated genre classification as a step toward automating metadata extraction from digital documents for ingest into digital repositories such as those run by archives, libraries and eprint services (Kim & Ross, 2006b). We have previously proposed dividing features of a document into five types (features for visual layout, language model features, stylometric features, features for semantic structure, and contextual features as an object linked to previously classified objects and other external sources) and have examined visual and language model features. The current paper compares results from testing classifiers based on image and stylometric features in a binary classification to show that certain genres have strong image features which enable effective separation of documents belonging to the genre from a large pool of other documents.


european conference on research and advanced technology for digital libraries | 2006

Genre classification in automated ingest and appraisal metadata

Yunhyong Kim; Seamus Ross

Metadata creation is a crucial aspect of the ingest of digital materials into digital libraries. Metadata needed to document and manage digital materials are extensive and manual creation of them expensive. The Digital Curation Centre (DCC) has undertaken research to automate this process for some classes of digital material. We have segmented the problem and this paper discusses results in genre classification as a first step toward automating metadata extraction from documents. Here we propose a classification method built on looking at the documents from five directions; as an object exhibiting a specific visual format, as a linear layout of strings with characteristic grammar, as an object with stylo-metric signatures, as an object with intended meaning and purpose, and as an object linked to previously classified objects and other external sources. The results of some experiments in relation to the first two directions are described here; they are meant to be indicative of the promise underlying this multi-facetted approach.


International Journal on Digital Libraries | 2010

Why did you pick that? Visualising relevance criteria in exploratory search

Ulises Cerviño Beresi; Yunhyong Kim; Dawei Song; Ian Ruthven

In this article, we present a set of approaches in analysing data gathered during experimentation with exploratory search systems and users’ acts of judging the relevance of the information retrieved by the system. We present three tools for quantitatively analysing encoded qualitative data: relevance-criteria profile, relevance-judgement complexity and session visualisation. Relevance-criteria profiles capture the prominence of each criterion usage with respect to the search sessions of individuals or selected user groups (e.g. groups defined by the users affiliations and/or level of research experience). Relevance-judgement complexity, on the other hand, reflects the number of criteria involved in a single judgment process. Finally, session visualisation brings these results together in a sequential representation of criteria usage and relevance judgements throughout a session, potentially allowing the researcher to quickly detect emerging patterns with respect to interactions, relevance criteria usage and complexity. The use of these tools is demonstrated using results from a pilot-user study that was conducted at the Robert Gordon University in 2008. We conclude by highlighting how these tools might be used to support the improvement of end-user services in digital libraries.


Data Science Journal | 2007

Detecting Family Resemblance: Automated Genre Classification

Yunhyong Kim; Seamus Ross

This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features, and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.


NLP4DL'09/AT4DL'09 Proceedings of the 2009 international conference on Advanced language technologies for digital libraries | 2009

Moving towards adaptive search in digital libraries

Udo Kruschwitz; M-Dyaa Albakour; Jinzhong Niu; Johannes Leveling; Nikolaos Nanas; Yunhyong Kim; Dawei Song; Maria Fasli; Anne N. De Roeck

Search applications have become very popular over the last two decades, one of the main drivers being the advent of the Web. Nevertheless, searching on the Web is very different to searching on smaller, often more structured collections such as digital libraries, local Web sites, and intranets. One way of helping the searcher locating the right information for a specific information need in such a collection is by providing well-structured domain knowledge to assist query modification and navigation. There are two main challenges which we will both address in this chapter: acquiring the domain knowledge and adapting it automatically to the specific interests of the user community. We will outline how in digital libraries a domain model can automatically be acquired using search engine query logs and how it can be continuously updated using methods resembling ant colony behaviour.


european conference on information retrieval | 2011

AutoEval: An Evaluation Methodology for Evaluating Query Suggestions Using Query Logs

M-Dyaa Albakour; Udo Kruschwitz; Nikolaos Nanas; Yunhyong Kim; Dawei Song; Maria Fasli; Anne N. De Roeck

User evaluations of search engines are expensive and not easy to replicate. The problem is even more pronounced when assessing adaptive search systems, for example system-generated query modification suggestions that can be derived from past user interactions with a search engine. Automatically predicting the performance of different modification suggestion models before getting the users involved is therefore highly desirable. AutoEval is an evaluation methodology that assesses the quality of query modifications generated by a model using the query logs of past user interactions with the system. We present experimental results of applying this methodology to different adaptive algorithms which suggest that the predicted quality of different algorithms is in line with user assessments. This makes AutoEval a suitable evaluation framework for adaptive interactive search engines


european conference on research and advanced technology for digital libraries | 2010

Relevance in technicolor

Ulises Cerviño Beresi; Yunhyong Kim; Dawei Song; Ian Ruthven; Mark Baillie

In this article we propose the concept of relevance criteria profiles, which provide a global view of user behaviour in judging the relevance of retrieved information. We further propose a plotting technique which provides a session based overview of the relevance judgement processes interlaced with interactions that allow the researcher to visualise and quickly detect emerging patterns in both interactions and relevance criteria usage. We discuss by example, using data from a user study conducted between the months of January and August of 2008, how these tools support the better understanding of task based user valuation of documents that is likely to lead to recommendations for improving enduser services in digital libraries.


Archive | 2010

Formulating Representative Features with Respect to Genre Classification

Yunhyong Kim; Seamus Ross

Document classification is one of the most fundamental steps in enabling the search, selection, and ranking of digital material according to its relevance in answering a predefined search. As such it is a valuable means of knowledge discovery and an essential part of the effective and efficient management of digital documents in a repository, library, or archive.

Collaboration


Dive into the Yunhyong Kim's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ian Ruthven

University of Strathclyde

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Vangelis Banos

Aristotle University of Thessaloniki

View shared research outputs
Researchain Logo
Decentralizing Knowledge