Is this you? Create Your Porfile

Jinsuk Kim

Korea Institute of Science and Technology Information

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jinsuk Kim is active.

Explore More

Publication

Featured researches published by Jinsuk Kim.

Journal of Information Processing Systems | 2009

Automatic In-Text Keyword Tagging based on Information Retrieval

Jinsuk Kim; Du-Seok Jin; Kwang-Young Kim; Ho-Seop Choe

Abstract: As shown in Wikipedia, tagging or cross-linking through major keywords in a document collection improves not only the readability of documents but also responsive and adaptive navigation among related documents. In recent years, the Semantic Web has increased the importance of social tagging as a key feature of the Web 2.0 and, as its crucial phenotype, Tag Cloud has emerged to the public. In this paper we provide an efficient method of automated in-text keyword tagging based on large-scale controlled term collection or keyword dictionary, where the computational complexity of O(mN) – if a pattern matching algorithm is used – can be reduced to O(mlogN) – if an Information Retrieval technique is adopted – while m is the length of target document and N is the total number of candidate terms to be tagged. The result shows that automatic in-text tagging with keywords filtered by Information Retrieval speeds up to about 6 ~ 40 times compared with the fastest pattern matching algorithm.

Journal of Information Processing Systems | 2015

WebSHArk 1.0: A Benchmark Collection for Malicious Web Shell Detection

Jinsuk Kim; Dong-Hoon Yoo; Heejin Jang; Kimoon Jeong

Web shells are programs that are written for a specific purpose in Web scripting languages, such as PHP, ASP, ASP.NET, JSP, PERL-CGI, etc. Web shells provide a means to communicate with the servers operating system via the interpreter of the web scripting languages. Hence, web shells can execute OS specific commands over HTTP. Usually, web attacks by malicious users are made by uploading one of these web shells to compromise the target web servers. Though there have been several approaches to detect such malicious web shells, no standard dataset has been built to compare various web shell detection techniques. In this paper, we present a collection of web shell files, WebSHArk 1.0, as a standard dataset for current and future studies in malicious web shell detection. To provide baseline results for future studies and for the improvement of current tools, we also present some benchmark results by scanning the WebSHArk dataset directory with three web shell scanning tools that are publicly available on the Internet. The WebSHArk 1.0 dataset is only available upon request via email to one of the authors, due to security and legal issues.

Journal of computing science and engineering | 2009

HKIB-20000 & HKIB-40075: Hangul Benchmark Collections for Text Categorization Research

Jinsuk Kim; Ho-Seop Choe; Beom-Jong You; Jeong-Hyun Seo; Suk-Hoon Lee; Dong-Yul Ra

The HKIB, or Hankookilbo, test collections are two archives of Korean newswire stories manually categorized with semi-hierarchical or hierarchical category taxonomies. The base newswire stories were made available by the Hankook Ilbo (The Korea Daily) for research purposes. At first, Chungnam National University and KISTI collaborated to manually tag 40,075 news stories with categories by semi-hierarchical and balanced three-level classification scheme, where each news story has only one level-3 category (single-labeling). We refer to this original data set as HKIB-40075 test collection. And then Yonsei University and KISTI collaborated to select 20,000 newswire stories from the HKIB-40075 test collection, to rearrange the classification scheme to be fully hierarchical but unbalanced, and to assign one or more categories to each news story (multi-labeling). We refer to this modified data set as HKIB-20000 test collection. We benchmark a k-NN categorization algorithm both on HKIB-20000 and on HKIB-40075, illustrating properties of the collections, providing baseline results for future studies, and suggesting new directions for further research on Korean text categorization problem.

international conference on advanced language processing and web information technology | 2007

Toward DB-IR Integration: Per-Document Basis Transactional Index Maintenance

Jinsuk Kim; Du-Seok Jin; Yun-Soo Choi; Chang-Hoo Jeong; Kwang-Young Kim; Sung-Pil Choi; Min-Ho Lee; Min-Hee Cho; Ho-Seop Choe; Hwa-Mook Yoon; Jeong-Hyun Seo

While information retrieval(IR) and databases(DB) have been developed independently, there have been emerging requirements that both data management and efficient text retrieval should be supported simultaneously in an information system such as health care systems, bulletin boards, XML data management, and digital libraries. Recently DB-IR integration issue has been budded in the research field. The great divide between DB and IR has caused different manners in index maintenance for newly arriving documents. While DB has extended its SQL layer to cope with text fields due to lack of intact mechanism to build IR-like index, IR usually treats a block of new documents as a logical unit of index maintenance since it has no concept of integrity constraint. However, towards DB-IR integration, a transaction on adding or updating a document should include maintenance of the postings lists accompanied by the document - hence per-document basis transactional index maintenance. In this paper, performance of a few strategies for per-document basis transaction for inserting documents -- direct index update, stand-alone auxiliary index and pulsing auxiliary index - will be evaluated. The result tested on the KRISTAL-IRMS shows that the pulsing auxiliary strategy, where long postings lists in the auxiliary index are in-place updated to the main index whereas short lists are directly updated in the auxiliary index, can be a challenging candidate for text field indexing in DB-IR integration.

parallel and distributed computing: applications and technologies | 2007

Service-Centric Object Fragmentation for Efficient Retrieval and Management of Huge XML Documents

Chang-Hoo Jeong; Yun-Soo Choi; Du-Seok Jin; Min-Ho Lee; Sung-Pil Choi; Kwang-Young Kim; Min-Hee Cho; Won-Kyun Joo; Hwa-Mook Yoon; Jeong-Hyun Seo; Jinsuk Kim

Vast amount of XML documents raise interests in how they will be used and how far their usage can be expanded, This paper has two central goals: 1) easy and fast retrieval of XML documents or relevant elements; and 2) efficient and stable management of large-size XML documents, The keys to develop such a practical system are how to segment a large XML document to smaller fragments and how to store them. In order to achieve these goals, we designed SCOF(Service-centric Object Fragmentation) model, which is a semi-decomposition method based on conversion rules provided by XML database managers. Keyword-based search using SCOF model then retrieves the specific elements or attributes of XML documents, just as typical XML query language does. Even though this approach needs the wisdom of managers in XML document collection, SCOF model makes it efficient both retrieval and management of massive XML documents.

The Journal of the Korea Contents Association | 2012

Implementation of Electronic Document Local Hosting System of Overseas Journals

Kwang-Young Kim; Jinsuk Kim; Jung-Hoon Park; Jeong-Hwan Kim

Today, As the internet and electronic publishing technology was powerful and there are many benefits to using various academic information which has an electronic document of high quality. Many researchers had much required to see a full-text of electronic document. There was an requirement of digital achieving activities between countries for each other electronic document in order to safely preserve and services. In this paper, we have implemented an electronic document local-hosting system to provide free service to KESLI member institutions and provide pay-per-view service to individual users and small/medium size companies who are not member of KESLI and builded national long-term preservation system of electronic information resources.

The Journal of the Korea Contents Association | 2010

Efficient Dynamic Index Structure for SSD (SPM)

Du-Seok Jin; Jinsuk Kim; Beom-Jong You; Hoe-Kyung Jung

Inverted index structures have become the most efficient data structure for high performance indexing of large text collections, especially online index maintenance, In-Place and merge-based index structures are the two main competing strategies for index construction in dynamic search environments. In the above-mentioned two strategies, a contiguity of posting information is the mainstay of design for online index maintenance and query time. Whereas with the emergence of new storage device(SSD, SCRAM), those do not consider a contiguity of posting information in the design of index structures because of its superiority such as low access latency and I/O throughput speeds. However, SSD(Solid State Drive) is not well suited for traditional inverted structures due to the poor random write throughput in practical systems. In this paper, we propose the new efficient online index structure(SPM) for SSD that significantly reduces the query time and improves the index maintenance performance.

parallel and distributed computing: applications and technologies | 2007

The Implementation of Distributed Retrieval System for a Large Number of Collections

Kwang Young Kim; Jinsuk Kim; Seok-Hyong Lee; Min-Ho Lee; Sung-Pil Choi; Yun-Soo Choi; Du-Seok Jin; Min-Hee Cho; Chang-Hoo Jeong; Nam-Gyu Kang; Ho-Seop Choe; Jerry Seo; Hwa-Mook Yoon

With the development of internet technologies, internet has been more complexly consisted of a large amount of Web documents, science technology documents, data-base and etc. Distributed retrieval system is more required to support effective retrieval and management about a large amount of Web documents, science technology documents and etc. Distributed retrieval system has to support for user to search quickly and exactly. A distributed retrieval system has to support for DB manager to manage easily. So we have developed the distributed retrieval system called dKRISTAL which finds indexing files and manages database system in real time. We have developed new dKRISTAL system which can support searching and managing database. We measured the integrated search speed of distributed retrieval system. Also this system effectively manages documents at realtime. This paper made an experiment using a large mount of science technology information system and Web documents using dKRISTAL. This paper analyzed the result of experiment.

computational intelligence for modelling, control and automation | 2005

An Inprovement of Information Retrieval System Using World Location Information

Kwang Young Kim; Du Seok Jin; Yoonsoo Choi; Jinsuk Kim; Young-Kyoon Suh; Jerry Seo

Today its difficult to effective searching the Internet because its a huge collection of documents. Currently most people search documents by just typing a few keywords. But this is very import keywords for searching documents. This paper suggests that the word location information is an import element for searching relevant documents. This paper suggests that using word location information of a user query can exactly improve the result of information retrieval system. This paper makes directly experiments on retrieval system with various weight methods of word location information

The Kips Transactions:partb | 2005

Automatic Text Categorization Using Passage-based Weight Function and Passage Type

Won-Kyun Joo; Jinsuk Kim; Kiseok Choi

Researches in text categorization have been confined to whole-document-level classification, probably due to lacks of full-text test collections. However, full-length documents available today in large quantities pose renewed interests in text classification. A document is usually written in an organized structure to present its main topic(s). This structure can be expressed as a sequence of sub-topic text blocks, or passages. In order to reflect the sub-topic structure of a document, we propose a new passage-level or passage-based text categorization model, which segments a test document into several passages, assigns categories to each passage, and merges passage categories to document categories. Compared with traditional document-level categorization, two additional steps, passage splitting and category merging, are required in this model. By using four subsets of Reuters text categorization test collection and a full-text test collection of which documents are varying from tens of kilobytes to hundreds, we evaluated the proposed model, especially the effectiveness of various passage types and the importance of passage location in category merging. Our results show simple windows are best for all test collections tested in these experiments. We also found that passages have different degrees of contribution to main topic(s), depending on their location in the test document.

Explore More