Yuen Hsien Tseng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yuen Hsien Tseng is active.

Explore More

Publication

Featured researches published by Yuen Hsien Tseng.

Information Processing and Management | 2007

Text mining techniques for patent analysis

Yuen Hsien Tseng; Chi Jen Lin; Yu I. Lin

Patent documents contain important research results. However, they are lengthy and rich in technical terminology such that it takes a lot of human efforts for analyses. Automatic tools for assisting patent engineers or decision makers in patent analysis are in great demand. This paper describes a series of text mining techniques that conforms to the analytical process used by patent analysts. These techniques include text segmentation, summary extraction, feature selection, term association, cluster generation, topic identification, and information mapping. The issues of efficiency and effectiveness are considered in the design of these techniques. Some important features of the proposed methodology include a rigorous approach to verify the usefulness of segment extracts as the document surrogates, a corpus- and dictionary-free algorithm for keyphrase extraction, an efficient co-word analysis method that can be applied to large volume of patents, and an automatic procedure to create generic cluster titles for ease of result interpretation. Evaluation of these techniques was conducted. The results confirm that the machine-generated summaries do preserve more important content words than some other sections for classification. To demonstrate the feasibility, the proposed methodology was applied to a real-world patent set for domain analysis and mapping, which shows that our approach is more effective than existing classification systems. The attempt in this paper to automate the whole process not only helps create final patent maps for topic analyses, but also facilitates or improves other patent analysis tasks such as patent classification, organization, knowledge sharing, and prior art searches.

international acm sigir conference on research and development in information retrieval | 1999

Content-based retrieval for music collections

Yuen Hsien Tseng

A content-based retrieval model for tackling the mismatch problems specific to music data is proposed and implemented. The system uses a pitch profile encoding for queries in any key and an n-note indexing method for approximate matching in sub-linear time. A distinct function that extracts key melodies for query suggestion is developed. The Web-based system provides flexible user interface for query formulation and result browsing. Users can search the system by a short sequence of notes, by uploading a file created by singing, or by clicking suggested key melodies without input. Experiments show that the pitch profile encoding and a 3-note indexing are able to overcome the key mismatch problem and the random errors caused by pitch error, note deletion and insertion. The use of extracted key melodies improves performance over direct search of the music database. For the type of burst mismatch, a query expansion approach is applied.

Journal of the Association for Information Science and Technology | 2002

Automatic thesaurus generation for Chinese documents

Yuen Hsien Tseng

This article reports an approach to automatic thesaurus construction for Chinese documents. An effective Chinese keyword extraction algorithm is first presented. Experiments showed that for each document an average of 33% keywords unknown to a lexicon of 123,226 terms could be identified by this algorithm. Of these unregistered words, only 8.3% of them are illegal. Keywords extracted from each document are further filtered for term association analysis. Association weights larger than a threshold are then accumulated over all the documents to yield the final term pair similarities. Compared to previous studies, this method speeds up the thesaurus generation process drastically, It also achieves a similar percentage level of term relatedness.

Scientometrics | 2009

A comparison of methods for detecting hot topics

Yuen Hsien Tseng; Yu I. Lin; Yi Yang Lee; Wen Chi Hung; Chun Hsiang Lee

In scientometrics for trend analysis, parameter choices for observing trends are often made ad hoc in past studies. For examples, different year spans might be used to create the time sequence and different indices were chosen for trend observation. However, the effectiveness of these choices was hardly known, quantitatively and comparatively. This work provides clues to better interpret the results when a certain choice was made. Specifically, by sorting research topics in decreasing order of interest predicted by a trend index and then by evaluating this ordering based on information retrieval measures, we compare a number of trend indices (percentage of increase vs. regression slope), trend formulations (simple trend vs. eigen-trend), and options (various year spans and durations for prediction) in different domains (safety agriculture and information retrieval) with different collection scales (72500 papers vs. 853 papers) to know which one leads to better trend observation. Our results show that the slope of linear regression on the time series performs constantly better than the others. More interestingly, this index is robust under different conditions and is hardly affected even when the collection was split into arbitrary (e.g., only two) periods. Implications of these results are discussed. Our work does not only provide a method to evaluate trend prediction performance for scientometrics, but also provides insights and reflections for past and future trend observation studies.

international acm sigir conference on research and development in information retrieval | 1998

Multilingual keyword extraction for term suggestion

Yuen Hsien Tseng

Users of information retrieval systems often input queries containing terms that do not match the terms used to index the majority of the relevant documents [l]. This problem of vocabulary mismatch can be effectively alleviated through a strategy known as “term suggestion” [2], an interactive process that allows users to peruse and select terms from those “suggested” by the system. However, to be able to suggest query terms, the system has to extract keywords from the document database, which leads to the problem of keyword extraction.

asia information retrieval symposium | 2006

Toward generic title generation for clustered documents

Yuen Hsien Tseng; Chi Jen Lin; Hsiu Han Chen; Yu I. Lin

A cluster labeling algorithm for creating generic titles based on external resources such as WordNet is proposed. Our method first extracts category-specific terms as cluster descriptors. These descriptors are then mapped to generic terms based on a hypernym search algorithm. The proposed method has been evaluated on a patent document collection and a subset of the Reuters-21578 collection. Experimental results revealed that our method performs as anticipated. Real-case applications of these generic terms show promising in assisting humans in interpreting the clustered topics. Our method is general enough such that it can be easily extended to use other hierarchical resources for adaptable label generation.

Journal of Information Science | 2007

Patent surrogate extraction and evaluation in the context of patent mapping

Yuen Hsien Tseng; Yeong Ming Wang; Yu I. Lin; Chi Jen Lin; Dai Wei Juang

Patent documents contain important research results. They are often collectively analyzed and organized in a visual way to support decision making. However, they are lengthy and rich in technical terminology, and thus require a lot of human effort for analysis. Automatic tools for assisting patent engineers or decision makers in patent analysis are in great demand. This paper describes a summarization method for patent surrogate extraction, intended to efficiently and effectively support patent mapping, which is an important subtask of patent analysis. Six patent maps were used to evaluate its relative usefulness. The experimental results confirm that the machine generated summaries do preserve more important content words than some other patent sections or even than the full patent texts when only a few terms are to be considered for classification and mapping. The implication is that if one were to determine a patents category based on only a few terms at a quick pace, one could begin by reading the section summaries generated automatically.

Expert Systems With Applications | 2010

Generic title labeling for clustered documents

Yuen Hsien Tseng

Document clustering is a powerful technique to detect topics and their relations for information browsing, analysis, and organization. However, clustered documents require post-assignment of descriptive titles to help users interpret the results. Existing techniques often assign labels to clusters based only on the terms that the clustered documents contain, which may not be sufficient for some applications. To solve this problem, a cluster labeling algorithm for creating generic titles, based on external resources such as WordNet, is proposed. Our method first extracts category-specific terms as cluster descriptors. These descriptors are then mapped to generic terms based on a hypernym search algorithm. The proposed method has been evaluated on a patent document collection and a subset of the Reuters-21578 collection. Experimental results revealed that our method performs as anticipated. Real-case applications of these generic terms show promising in assisting humans in interpreting the clustered topics. Our method is general enough such that it can be easily extended to use other hierarchical resources for adaptable label generation.

patent information retrieval | 2008

A study of search tactics for patentability search: a case study on patent engineers

Yuen Hsien Tseng; Yi Jen Wu

The goal of this study is to understand the search tactics patent engineers apply when they perform patentability search. We hope that the result can be used to: (1) provide references for novice patent engineers and ordinary users for patentability search; (2) imply some directions for improvement of future patent search systems.

Public Understanding of Science | 2012

Are you SLiM? Developing an instrument for civic scientific literacy measurement (SLiM) based on media coverage

Carl Johan Rundgren; Shu-Nu Chang Rundgren; Yuen Hsien Tseng; Pei Ling Lin; Chun Yen Chang

The purpose of this study is to develop an instrument to assess civic scientific literacy measurement (SLiM), based on media coverage. A total of 50 multiple-choice items were developed based on the most common scientific terms appearing in media within Taiwan. These questions covered the subjects of biology (45.26%, 22 items), earth science (37.90%, 19 items), physics (11.58%, 6 items) and chemistry (5.26%, 3 items). A total of 1034 students from three distinct groups (7th graders, 10th graders, and undergraduates) were invited to participate in this study. The reliability of this instrument was 0.86 (KR 20). The average difficulty of the SLiM ranged from 0.19 to 0.91, and the discrimination power was 0.1 to 0.59. According to participants’ performances on SLiM, it was revealed that 10th graders (Mean = 37.34±0.23) performed better than both undergraduates (Mean = 33.00±0.33) and 7th graders (Mean = 26.73±0.45) with significant differences in their SLiM.

Explore More