Hyun-Je Song | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hyun-Je Song is active.

Explore More

Publication

Featured researches published by Hyun-Je Song.

north american chapter of the association for computational linguistics | 2016

A Translation-Based Knowledge Graph Embedding Preserving Logical Property of Relations

Hee-Geun Yoon; Hyun-Je Song; Seong-Bae Park; Se-Young Park

This paper proposes a novel translation-based knowledge graph embedding that preserves the logical properties of relations such as transitivity and symmetricity. The embedding space generated by existing translation-based embeddings do not represent transitive and symmetric relations precisely, because they ignore the role of entities in triples. Thus, we introduce a role-specific projection which maps an entity to distinct vectors according to its role in a triple. That is, a head entity is projected onto an embedding space by a head projection operator, and a tail entity is projected by a tail projection operator. This idea is applied to TransE, TransR, and TransD to produce lppTransE, lppTransR, and lppTransD, respectively. According to the experimental results on link prediction and triple classification, the proposed logical property preserving embeddings show the state-of-the-art performance at both tasks. These results prove that it is critical to preserve logical properties of relations while embedding knowledge graphs, and the proposed method does it effectively.

web intelligence | 2008

Discriminating Meaningful Web Tables from Decorative Tables Using a Composite Kernel

Jeong Woo Son; Jae-An Lee; Seong-Bae Park; Hyun-Je Song; Sang-Jo Lee; Se-Young Park

Information extraction from world wide web has been paid great attention to. Since a table is a well-organized and summarized knowledge expression for a domain, it is of great importance to extract information from the tables. However, many tables in web pages are used not to transfer information but to decorate the pages. Therefore, it is one of the most critical tasks in web table mining to discriminate the meaningful tables from the decorative ones. The main obstacle of this task comes from the difficulty of generating relevant features for the discrimination. This paper proposes a novel method to discriminate them using a composite kernel which combines a parse tree kernel and a linear kernel. Since a web table is represented as a parse tree by a HTML parser, the parse tree kernel can be naturally used in determining the similarity between trees, and the linear kernel with content features is used to make up for the weak points of the parse tree kernel. The support vector machines with the composite kernel distinguish with high accuracy the meaningful tables from the decorative ones. A series of experiments show that the proposed method achieves the state-of-the-art performance.

international conference on information and automation | 2009

An automatic ontology population with a machine learning technique from semi-structured documents

Hyun-Je Song; Seong-Bae Park; Se-Young Park

The manual design of an ontology usually defines the concepts for the domain, but the individual instances of the concepts are often missing though they are important in using the ontology as a knowledge base. This is due to high cost of the manual construction of individuals. In order to tackle this problem, this paper proposes an automatic method for ontology population. The knowledge source for ontology population used in this paper is the web tables of which structure is relatively well organized. Since a web table can be analyzed into a parse tree, the most appropriate concept within the ontology for a given web table is determined by a kernel method, so-called a parse tree kernel. Then, the table is populated as an individual of the concept. According to the experimental results on a large ontology with a great number of concepts, the proposed method achieves 62.35% of accuracy for a number of web tables.

Engineering Applications of Artificial Intelligence | 2013

An application for plagiarized source code detection based on a parse tree kernel

Jeong Woo Son; Tae-Gil Noh; Hyun-Je Song; Seong-Bae Park

Program plagiarism detection is a task of detecting plagiarized code pairs among a set of source codes. In this paper, we propose a code plagiarism detection system that uses a parse tree kernel. Our parse tree kernel calculates a similarity value between two source codes in terms of their parse tree similarity. Since parse trees contain the essential syntactic structure of source codes, the system effectively handles structural information. The contributions of this paper are two-fold. First, we propose a parse tree kernel that is optimized for program source code. The evaluation shows that our system based on this kernel outperforms well-known baseline systems. Second, we collected a large number of real-world Java source codes from a university programming class. This test set was manually analyzed and tagged by two independent human annotators to mark plagiarized codes. It can be used to evaluate the performance of various detection systems in real-world environments. The experiments with the test set show that the performance of our plagiarism detection system reaches to 93% level of human annotators.

international conference on information technology: new generations | 2011

Two-Step Sentence Extraction for Summarization of Meeting Minutes

Jae-Kul Lee; Hyun-Je Song; Seong-Bae Park

These days a number of meeting minutes of various organizations are publicly available and the interest in these documents by people is increasing. However it is time-consuming and tedious to read and understand whole documents even if the documents can be accessed easily. In addition, what most people want from meeting minutes is to catch the main issues of the meeting and to understand its contexts rather than to know whole discussions of the meetings. Existing text summarization techniques applied to this problem often fail because they are developed without considering the characteristics of the meeting minutes. In order to improve the performance of summarization of meeting minutes, this paper proposes a novel method for summarizing documents based-on two-step sentence extraction. It first extracts the sentences which are addressing the main issues. For each issue expressed in the extracted sentences, the sentences related with the issue are then extracted in the second step. Then, by transforming the extracted sentences into a tree-structure form, the results of the proposed method can be understood better than existing methods. In the experiments, the proposed method shows remarkable improvement in performance and this result implies that the proposed method is plausible for summarizing meeting minutes.

intelligent information systems | 2017

A just-in-time keyword extraction from meeting transcripts using temporal and participant information

Hyun-Je Song; Junho Go; Seong-Bae Park; Se-Young Park; Kweon Yang Kim

In a meeting, it is often desirable to extract the keywords from each utterance as soon as it is spoken. Therefore, this paper proposes a just-in-time keyword extraction from meeting transcripts. The proposed method considers three major factors that make it different from keyword extraction from normal texts. The first factor is the temporal history of the preceding utterances that grants higher importance to recent utterances than older ones, and the second is topic relevance, which focuses only on the preceding utterances relevant to the current utterance. The final factor is the participants. The utterances spoken by the current speaker should be considered more important than those spoken by other participants. The proposed method considers these factors simultaneously under a graph-based keyword extraction with some graph operations. Experiments on two data sets in English and Korean show that consideration of these factors results in improved performance in keyword extraction from meeting transcripts.

Mathematical Problems in Engineering | 2015

Computation of Program Source Code Similarity by Composition of Parse Tree and Call Graph

Hyun-Je Song; Seong-Bae Park; Se Young Park

This paper proposes a novel method to compute how similar two program source codes are. Since a program source code is represented as a structural form, the proposed method adopts convolution kernel functions as a similarity measure. Actually, a program source code has two kinds of structural information. One is syntactic information and the other is the dependencies of function calls lying on the program. Since the syntactic information of a program is expressed as its parse tree, the syntactic similarity between two programs is computed by a parse tree kernel. The function calls within a program provide a global structure of a program and can be represented as a graph. Therefore, the similarity of function calls is computed with a graph kernel. Then, both structural similarities are reflected simultaneously into comparing program source codes by composing the parse tree and the graph kernels based on a cyclomatic complexity. According to the experimental results on a real data set for program plagiarism detection, the proposed method is proved to be effective in capturing the similarity between programs. The experiments show that the plagiarized pairs of programs are found correctly and thoroughly by the proposed method.

web intelligence | 2012

Location Comparison through Geographical Topics

Jeong Woo Son; Yunseok Noh; Hyun-Je Song; Seong-Bae Park

With the increasing interest in location-based services, location comparison gains more and more attentions. One of the best ways to represent a location is to use topics that are generated near the location. In order to compare locations through such geographical topics, two conditions need to be met. One is that the topic set should be fixed but cover various aspects of all possible locations, and the other is that geographical topics often depend on each other. This paper proposes Probabilistic Explicit Semantic Analysis (PESA) that meets these conditions. PESA represents a location as a weighted topic vector where each topic is a Wikipedia concept. The number of Wikipedia concepts is fixed, but their enormous quantity allows PESA to be used to compare various locations. In addition, link information within Wikipedia articles is used to compute prior probabilities of topics considering their dependencies. That is, it enables PESA to model the topic dependency. PESA was evaluated using eighteen locations in three distinct geographical categories and compare it with LDA and ESA. The experimental results that PESA outperformed LDA and ESA highlighting its superiority in location comparison.

asian conference on machine learning | 2009

Coping with Distribution Change in the Same Domain Using Similarity-Based Instance Weighting

Jeong Woo Son; Hyun-Je Song; Seong-Bae Park; Se-Young Park

Lexicons are considered as the most crucial features in natural language processing (NLP), and thus often used in machine learning algorithms applied to NLP tasks. However, due to the diversity of lexical space, the machine learning algorithms with lexical features suffer from the difference between distributions of training and test data. In order to overcome the distribution change, this paper proposes support vector machines with example-wise weights. The training distribution coincides with the test distribution by weighting training examples according to their similarity to all test data. The experimental results on text chunking show that the distribution change between training and test data is actually recognized and the proposed method which considers this change in its training phase outperforms ordinary support vector machines.

advances in social networks analysis and mining | 2013

Identifying user attributes through non-i.i.d. multi-instance learning

Hyun-Je Song; Jeong Woo Son; Seong-Bae Park

User attribute is an essential factor for personalized recommendation and targeted advertising. Therefore, there have been a number of studies to identify user attributes automatically from SNS postings, since the postings reveal various attributes of writers. Many kinds of machine learning methods have been applied to automatic identification of user attributes as a candidate solution, but they suffer from two major problems. First, there are many postings in SNS that do not deliver any information about writers. Then, learning from SNS postings results in a biased model by these irrelevant postings. Second, the postings of a SNS user are somewhat related one another. However, most machine learning methods ignore this information, since they assume that data are independently and identically distributed. In order to solve these problems in user attribute identification, this paper proposes a novel method based on non-i.i.d. multi-instance learning. Since multi-instance learning treats all postings by a user as a bag and learns user attribute identification with such bags, not with postings, the first problem is solved. In addition, the proposed method assumes that the postings by a single user have a structure. By incorporating this assumption into the multi-instance learning, the second problem is solved. Our experimental results show that consideration of these two problems in automatic user attribute identification results in performance improvement.

Explore More