Joongmin Choi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joongmin Choi is active.

Explore More

Publication

Featured researches published by Joongmin Choi.

IEEE Transactions on Consumer Electronics | 2010

Repetition-based web page segmentation by detecting tag patterns for small-screen devices

Jinbeom Kang; Jaeyoung Yang; Joongmin Choi

Web page segmentation into logical blocks is an important preprocessing step for recognizing informative content blocks in a page that leads to efficient information extraction and convenient display on the devices with small-sized screens. Previous methods for Web page segmentation are not flexible in a dynamic Web environment because they largely relied on heuristic rules generated by exploiting structural tags and visual information inherent in a page. To resolve this problem, this paper proposes a new method of Web page segmentation by recognizing repetitive tag patterns called key patterns in the DOM tree structure of a page. We report on the Repetition-based Page Segmentation (REPS) algorithm, which detects key patterns in a page and generates virtual nodes to correctly segment nested blocks. A series of experiments performed for real Web sites showed that REPS greatly contributes to improving the correctness of Web page segmentation.

adaptive agents and multi-agents systems | 2001

MORPHEUS: a more scalable comparison-shopping agent

Jaeyoung Yang; Hee-Kyoung Seo; Joongmin Choi

This paper proposes a more scalable comparison-shopping agent named MORPHEUS. MORPHEUS presents simple but robust inductive learning algorithm that automatically constructs wappers.

Scientometrics | 2014

Author name disambiguation using a graph model with node splitting and merging based on bibliographic information

Dongwook Shin; Taehwan Kim; Joongmin Choi; Jungsun Kim

Author ambiguity mainly arises when several different authors express their names in the same way, generally known as the namesake problem, and also when the name of an author is expressed in many different ways, referred to as the heteronymous name problem. These author ambiguity problems have long been an obstacle to efficient information retrieval in digital libraries, causing incorrect identification of authors and impeding correct classification of their publications. It is a nontrivial task to distinguish those authors, especially when there is very limited information about them. In this paper, we propose a graph based approach to author name disambiguation, where a graph model is constructed using the co-author relations, and author ambiguity is resolved by graph operations such as vertex (or node) splitting and merging based on the co-authorship. In our framework, called a Graph Framework for Author Disambiguation (GFAD), the namesake problem is solved by splitting an author vertex involved in multiple cycles of co-authorship, and the heteronymous name problem is handled by merging multiple author vertices having similar names if those vertices are connected to a common vertex. Experiments were carried out with the real DBLP and Arnetminer collections and the performance of GFAD is compared with three representative unsupervised author name disambiguation systems. We confirm that GFAD shows better overall performance from the perspective of representative evaluation metrics. An additional contribution is that we released the refined DBLP collection to the public to facilitate organizing a performance benchmark for future systems on author disambiguation.

multimedia and ubiquitous engineering | 2008

Ontology-Based User Intention Recognition for Proactive Planning of Intelligent Robot Behavior

Hochul Jeon; Taehwan Kim; Joongmin Choi

To recognize user intention proactively and do a suitable action or service are one of important issues in intelligent robot. Even when a user acts the same behavior, its intention may be different according to the users context. It means that user intention recognition involves the uncertainties, and by minimizing the uncertainties can improve the accuracy of the user intention recognition. This paper suggests a novel ontology-based approach for user intention recognition. We propose a method of minimizing the uncertainties that are the main obstacles against the precise recognition of user intention. This approach creates an ontology for user intention, makes a hierarchy and relationship among user intentions, and precisely recognizes user intention by using the gathered sensor data such as temperature, humidity, vision, and auditory. We developed a simulator that evaluates the performance of robot proactive planning mechanism.

web intelligence | 2006

Topic-Specific Web Content Adaptation to Mobile Devices

Eunshil Lee; Jinbeom Kang; Joongmin Choi; Jaeyoung Yang

Mobile content adaptation is a technology of effectively representing the contents originally built for the desktop PC on wireless mobile devices. Previous approaches for Web content adaptation are mostly device-dependent. Also, the content transformation to suit to a smaller device is done manually. As a result, the user has difficulty in selecting relevant information from a heavy volume of contents since the context information related to the content is not provided. To resolve these problems, this paper proposes an enhanced method of Web content adaptation for mobile devices. In our system, the process of Web content adaptation consists of 4 stages including block filtering, block title extraction, block content summarization, and personalization through learning. As a result of learning, personalization is realized by showing the information for the relevant block at the top of the content list

web intelligence | 2007

Web Document Clustering by Using Automatic Keyphrase Extraction

Juhyun Han; Taehwan Kim; Joongmin Choi

In most traditional techniques of document clustering, the number of total clusters is not known in advance and the cluster that contain the target information cannot be determined since the semantic nature is not associated with the cluster. The well-known K-means clustering algorithm partially solves these problems by allowing users to specify the number of clusters. However, if the pre-specified number of clusters is modified, the precision of each result also changes. To solve this problem, this paper proposes a new clustering algorithm based on the Kea keyphrase extraction algorithm which returns several keyphrases from the source documents by using some machine learning techniques. In this paper, documents are grouped into several clusters like K-means, but the number of clusters is automatically determined by the algorithm with some heuristics using the extracted keyphrases. Our Kea-means clustering algorithm provides easy and efficient ways to extract test documents from massive quantities of resources.

international symposium on information technology convergence | 2007

Detecting Informative Web Page Blocks for Efficient Information Extraction Using Visual Block Segmentation

Jinbeom Kang; Joongmin Choi

As the structure of a Web page is getting more complicated, the construction of wrapper induction rules becomes more difficult and time-consuming. The main problem in most wrapper induction methods is the difficulty in discriminating the meaningful blocks that contain the target information from the noise blocks that contains irrelevant information such as advertisements, menus, or copyright statements. To solve this problem, this paper proposes the RIPB(recognizing informative page blocks) algorithm that detects the informative blocks in a Web page by exploiting the visual block segmentation scheme. RIPB uses the visual page segmentation algorithm to analyze and partition a Web page into a set of logical blocks, and then groups related blocks with similar structures into a block cluster and recognizes the informative block clusters by applying some heuristic rules to the cluster information. The results of a series of experiments indicate that RIPB contributes to improve the accuracy of information extraction by allowing the wrapper induction module to focus only on the informative block information and ignore other noise information in building extraction rules.

international conference on information science and applications | 2011

An Ontology-Based Recommendation System Using Long-Term and Short-Term Preferences

Jinbeom Kang; Joongmin Choi

Personalized information retrieval and recommendation systems have been proposed to deliver the right information to users with different interests. However, most of previous systems are using keyword frequencies as the main factor for personalization, and as a result, they could not analyze semantic relations between words. Also, previous methods often fail to provide the documents that are related semantically with the query words. To solve these problems, we propose a recommendation system which provides relevant documents to users by identifying semantic relations between an ontology that semantically represents the documents crawled by a Web robot and user behavior history. Recommendation is mainly based on content-based similarity, semantic similarity, and preference weights.

international symposium on industrial electronics | 2001

Building intelligent systems for mining information extraction rules from web pages by using domain knowledge

Keekyoung Seo; Jaeyoung Yang; Joongmin Choi

Previous research on automatic information extraction experienced difficulties in acquiting and representing useful domain knowledge and in coping with the structural heterogeneity among different information sources. As a result, many real-world information sources with complex document structures could not be correctly analyzed. In order to resolve these problems, this paper presents a method of building intelligent systems for mining information extraction rules from semi-structured Web pages by using domain knowledge. This system automatically generates a wrapper for each information source and performs information extraction and information integration by applying this wrapper to the corresponding source. Both the domain knowledge and the wrapper are represented by ML documents to increase flexibility and interoperability. By testing our prototype system on several real-estate information sites, we can claim that it creates the correct wrappers for most Web sources and consequently facilitates effective information extraction for heterogeneous information sources.

intelligent data engineering and automated learning | 2000

A Shopping Agent That Automatically Constructs Wrappers for Semi-Structured Online Vendors

Jaeyoung Yang; Eunseok Lee; Joongmin Choi

This paper proposes a shopping agent with a robust inductive learning method that automatically constructs wrappers for semi-structured online stores. Strong biases assumed in many existing systems are weakened so that the real stores with reasonably complex document structures can be handled. Our method treats a logical line as a basic unit, and recognizes the position and the structure of product descriptions by finding the most frequent pattern from the sequence of logical line information in output HTML pages. This method is capable of analyzing product descriptions that comprise multiple logical lines, and even those with extra or missing attributes. Experimental tests on over 60 sites show that it successfully constructs correct wrappers for most real stores.

Explore More