Jeong Woo Son | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jeong Woo Son is active.

Explore More

Publication

Featured researches published by Jeong Woo Son.

pacific rim international conference on artificial intelligence | 2006

Program plagiarism detection using parse tree Kernels

Jeong Woo Son; Seong-Bae Park; Se-Young Park

Many existing plagiarism detection systems fail in detecting plagiarism when there are an abundant garbage in the copied programs. This is because they do not use the structural information efficiently. In this paper, we propose a novel plagiarism detection system which uses parse tree kernels. By incorporating parse tree kernels into the system, it efficiently handles the structural information within source programs. A comparison with existing systems such as SID and JPlag shows that the proposed system can detect plagiarism more accurately due to its ability of handling structural information.

web intelligence | 2008

Discriminating Meaningful Web Tables from Decorative Tables Using a Composite Kernel

Jeong Woo Son; Jae-An Lee; Seong-Bae Park; Hyun-Je Song; Sang-Jo Lee; Se-Young Park

Information extraction from world wide web has been paid great attention to. Since a table is a well-organized and summarized knowledge expression for a domain, it is of great importance to extract information from the tables. However, many tables in web pages are used not to transfer information but to decorate the pages. Therefore, it is one of the most critical tasks in web table mining to discriminate the meaningful tables from the decorative ones. The main obstacle of this task comes from the difficulty of generating relevant features for the discrimination. This paper proposes a novel method to discriminate them using a composite kernel which combines a parse tree kernel and a linear kernel. Since a web table is represented as a parse tree by a HTML parser, the parse tree kernel can be naturally used in determining the similarity between trees, and the linear kernel with content features is used to make up for the weak points of the parse tree kernel. The support vector machines with the composite kernel distinguish with high accuracy the meaningful tables from the decorative ones. A series of experiments show that the proposed method achieves the state-of-the-art performance.

international conference on advanced language processing and web information technology | 2007

A Convolution Kernel Method for Color Recognition

Jeong Woo Son; Seong-Bae Park; Ku-Jin Kim

Color recognition for out-door images is important for low-level computer vision, but it is a difficult task due to the effect of circumstances such as illumination, weather and so on. In this paper, we propose a novel convolution kernel method to extract color information from out-door images.When two images are compared, the proposed kernel maps images onto a high-dimentional feature space of which features are image fragments of two images and then the similarity between them is obtained through the inner-production of two image vectors. To evaluate the proposed kernel, it is applied to the vehicle color recognition problem. In the experiments on 500 vehicle images, the vehicle color recognition model with the proposed kernel shows about 92% of precision and 92% of recall. On the other hands, the model with a linear kernel shows about 45% of precision and 45% of recall. These experimental results imply that the proposed kernel is a plausible approach for the color recognition task.

Engineering Applications of Artificial Intelligence | 2013

An application for plagiarized source code detection based on a parse tree kernel

Jeong Woo Son; Tae-Gil Noh; Hyun-Je Song; Seong-Bae Park

Program plagiarism detection is a task of detecting plagiarized code pairs among a set of source codes. In this paper, we propose a code plagiarism detection system that uses a parse tree kernel. Our parse tree kernel calculates a similarity value between two source codes in terms of their parse tree similarity. Since parse trees contain the essential syntactic structure of source codes, the system effectively handles structural information. The contributions of this paper are two-fold. First, we propose a parse tree kernel that is optimized for program source code. The evaluation shows that our system based on this kernel outperforms well-known baseline systems. Second, we collected a large number of real-world Java source codes from a university programming class. This test set was manually analyzed and tagged by two independent human annotators to mark plagiarized codes. It can be used to evaluate the performance of various detection systems in real-world environments. The experiments with the test set show that the performance of our plagiarism detection system reaches to 93% level of human annotators.

web intelligence | 2012

Location Comparison through Geographical Topics

Jeong Woo Son; Yunseok Noh; Hyun-Je Song; Seong-Bae Park

With the increasing interest in location-based services, location comparison gains more and more attentions. One of the best ways to represent a location is to use topics that are generated near the location. In order to compare locations through such geographical topics, two conditions need to be met. One is that the topic set should be fixed but cover various aspects of all possible locations, and the other is that geographical topics often depend on each other. This paper proposes Probabilistic Explicit Semantic Analysis (PESA) that meets these conditions. PESA represents a location as a weighted topic vector where each topic is a Wikipedia concept. The number of Wikipedia concepts is fixed, but their enormous quantity allows PESA to be used to compare various locations. In addition, link information within Wikipedia articles is used to compute prior probabilities of topics considering their dependencies. That is, it enables PESA to model the topic dependency. PESA was evaluated using eighteen locations in three distinct geographical categories and compare it with LDA and ESA. The experimental results that PESA outperformed LDA and ESA highlighting its superiority in location comparison.

asian conference on machine learning | 2009

Coping with Distribution Change in the Same Domain Using Similarity-Based Instance Weighting

Jeong Woo Son; Hyun-Je Song; Seong-Bae Park; Se-Young Park

Lexicons are considered as the most crucial features in natural language processing (NLP), and thus often used in machine learning algorithms applied to NLP tasks. However, due to the diversity of lexical space, the machine learning algorithms with lexical features suffer from the difference between distributions of training and test data. In order to overcome the distribution change, this paper proposes support vector machines with example-wise weights. The training distribution coincides with the test distribution by weighting training examples according to their similarity to all test data. The experimental results on text chunking show that the distribution change between training and test data is actually recognized and the proposed method which considers this change in its training phase outperforms ordinary support vector machines.

Journal of Information Science and Engineering | 2015

Ontology Kernel A Convolution Kernel for Ontology Alignment

Jeong Woo Son; Hee-Geun Yoon; Seong-Bae Park

Every ontology entity such as a concept or a property has its own structural information represented as a graph due to the relations with other entities. Therefore, it is important to consider not only its lexical similarity but also structural similarity in ontology alignment. This paper proposes ontology kernel that computes both types of similarities simultaneously. The idea of this kernel is to measure the structural similarity of ontology entities by mapping their entity graphs into the space spanned by entity random walks. The graph of an entity in the kernel expresses all relations with other entities. Thus, the ontology kernel can compare the similarity between entities no matter how complex the entities are and no matter how many kinds of relations they possess. A series of experiments with the standard data sets prove the generality and the superiority of the ontology kernel in ontology alignment.

advances in social networks analysis and mining | 2013

Identifying user attributes through non-i.i.d. multi-instance learning

Hyun-Je Song; Jeong Woo Son; Seong-Bae Park

User attribute is an essential factor for personalized recommendation and targeted advertising. Therefore, there have been a number of studies to identify user attributes automatically from SNS postings, since the postings reveal various attributes of writers. Many kinds of machine learning methods have been applied to automatic identification of user attributes as a candidate solution, but they suffer from two major problems. First, there are many postings in SNS that do not deliver any information about writers. Then, learning from SNS postings results in a biased model by these irrelevant postings. Second, the postings of a SNS user are somewhat related one another. However, most machine learning methods ignore this information, since they assume that data are independently and identically distributed. In order to solve these problems in user attribute identification, this paper proposes a novel method based on non-i.i.d. multi-instance learning. Since multi-instance learning treats all postings by a user as a bag and learns user attribute identification with such bags, not with postings, the first problem is solved. In addition, the proposed method assumes that the postings by a single user have a structure. By incorporating this assumption into the multi-instance learning, the second problem is solved. Our experimental results show that consideration of these two problems in automatic user attribute identification results in performance improvement.

Applied Soft Computing | 2013

Web table discrimination with composition of rich structural and content information

Jeong Woo Son; Seong-Bae Park

A table is a well-organized and summarized knowledge expression for a domain. Therefore, it is of great importance to extract information from tables. However, many tables in Web pages are used not to transfer information but to decorate pages. One of the most critical tasks in Web table mining is thus to discriminate meaningful tables from decorative ones. The main obstacle of this task comes from the difficulty of generating relevant features for discrimination. This paper proposes a novel discrimination method using a composite kernel which combines parse tree kernels and a linear kernel. Because a Web table is represented as a parse tree by an HTML parser, it is natural to represent the structural information of a table as a parse tree. In this paper, two types of parse trees are used to represent structural information within and around a table. These two trees define the structure kernel that handles the structural information of tables. The contents of a Web table are manipulated by a linear kernel with content features. Support vector machines with the composite kernel distinguish meaningful tables from decorative ones with high accuracy. A series of experiments show that the proposed method achieves state-of-the-art performance.

international conference on neural information processing | 2011

Expanding knowledge source with ontology alignment for augmented cognition

Jeong Woo Son; Seongtaek Kim; Seong-Bae Park; Yunseok Noh; Junho Go

Augmented cognition on sensory data requires knowledge sources to expand the abilities of human senses. Ontologies are one of the most suitable knowledge sources, since they are designed to represent human knowledge and a number of ontologies on diverse domains can cover various objects in human life. To adopt ontologies as knowledge sources for augmented cognition, various ontologies for a single domain should be merged to prevent noisy and redundant information. This paper proposes a novel composite kernel to merge heterogeneous ontologies. The proposed kernel consists of lexical and graph kernels specialized to reflect structural and lexical information of ontology entities. In experiments, the composite kernel handles both structural and lexical information on ontologies more efficiently than other kernels designed to deal with general graph structures. The experimental results also show that the proposed kernel achieves the comparable performance with top-five systems in OAEI 2010.

Explore More