Yi-cheng Pan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yi-cheng Pan is active.

Explore More

Publication

Featured researches published by Yi-cheng Pan.

IEEE Transactions on Audio, Speech, and Language Processing | 2010

Performance Analysis for Lattice-Based Speech Indexing Approaches Using Words and Subword Units

Yi-cheng Pan; Lin-Shan Lee

Lattice-based speech indexing approaches are attractive for the combination of short spoken segments, short queries, and low automatic speech recognition (ASR) accuracies, as lattices provide recognition alternatives and therefore tend to compensate for recognition errors. Position-specific posterior lattices (PSPLs) and confusion networks (CNs), two of the most popular lattice-based approaches, both reduce disk space requirements and are more efficient than raw lattices. When PSPLs and CNs are used in a word-based fashion, they cannot handle OOV or rare word queries. In this paper, we propose an efficient approach for the construction of subword-based PSPLs (S-PSPLs) and CNs (S-CNs) and present a comprehensive performance analysis of PSPL and CN structures using both words and subword units, taking into account basic principles and structures, and supported by experimental results on Mandarin Chinese. S-PSPLs and S-CNs are shown to yield significant mean average precision (MAP) improvements over word-based PSPLs and CNs for both out-of-vocabulary (OOV) and in-vocabulary queries while requiring much less disk space for indexing.

ieee automatic speech recognition and understanding workshop | 2007

Analytical comparison between position specific posterior lattices and confusion networks based on words and subword units for spoken document indexing

Yi-cheng Pan; Hung-lin Chang; Lin-Shan Lee

In this paper we analytically compare the two widely accepted approaches of spoken document indexing, position specific posterior lattices (PSPL) and confusion network (CN), in terms of retrieval accuracy and index size. The fundamental distinctions between these two approaches in terms of construction units, posterior probabilities, number of clusters, indexing coverage and space requirements are discussed in detail. A new approach to approximate subword posterior probability in a word lattice is also incorporated in PSPL/CN to handle OOV/rare word problems, which were unaddressed in original PSPL and CN approaches. Extensive experimental results on Chinese broadcast news segments indicate that PSPL offers higher accuracy than CN but requiring much larger disk space, while subword-based PSPL turns out to be very attractive because it lowers the storage cost while offers even higher accuracies.

IEEE Transactions on Audio, Speech, and Language Processing | 2012

Interactive Spoken Document Retrieval With Suggested Key Terms Ranked by a Markov Decision Process

Yi-cheng Pan; Hung-yi Lee; Lin-Shan Lee

Interaction with users is a powerful strategy that potentially yields better information retrieval for all types of media, including text, images, and videos. While spoken document retrieval (SDR) is a crucial technology for multimedia access in the network era, it is also more challenging than text information retrieval because of the inevitable recognition errors. It is therefore reasonable to consider interactive functionalities for SDR systems. We propose an interactive SDR approach in which given the users query, the system returns not only the retrieval results but also a short list of key terms describing distinct topics. The user selects these key terms to expand the query if the retrieval results are not satisfactory. The entire retrieval process is organized around a hierarchy of key terms that define the allowable state transitions; this is modeled by a Markov decision process, which is popularly used in spoken dialogue systems. By reinforcement learning with simulated users, the key terms on the short list are properly ranked such that the retrieval success rate is maximized while the number of interactive steps is minimized. Significant improvements over existing approaches were observed in preliminary experiments performed on information needs provided by real users. A prototype system was also implemented.

international symposium on chinese spoken language processing | 2006

Improved large vocabulary continuous chinese speech recognition by character-based consensus networks

Yi-sheng Fu; Yi-cheng Pan; Lin-Shan Lee

Word-based consensus networks have been verified to be very useful in minimizing word error rates (WER) for large vocabulary continuous speech recognition for western languages. By considering the special structure of Chinese language, this paper points out that character-based rather then word-based consensus networks should work better for Chinese language. This was verified by extensive experimental results also reported in the paper.

spoken language technology workshop | 2008

Latent semantic retrieval of spoken documents over position specific posterior lattices

Hung-lin Chang; Yi-cheng Pan; Lin-Shan Lee

This paper presents a new approach of latent semantic retrieval of spoken documents over Position Specific Posterior Lattices (PSPL). This approach performs concept matching instead of literal term matching during retrieval based on the Probabilistic Latent Semantic Analysis (PLSA), so as to solve the problem of term mismatch between the query and the desired spoken documents. This approach is performed over PSPL to consider the multiple hypotheses generated by ASR process, as well as the position information for these hypotheses, so as to alleviate the problem of relatively poor ASR accuracy. We establish a framework to evaluate semantic relevance between terms and the relevance score between a query and a PSPL, both based on the latent topic information from PLSA. Preliminary experiments on Chinese broadcast news segments showed significant improvements can be obtained with the proposed approach.

ieee automatic speech recognition and understanding workshop | 2007

Type-II dialogue systems for information access from unstructured knowledge sources

Yi-cheng Pan; Lin-Shan Lee

In this paper, we present a new formulation and a new framework for a new type of dialogue system, referred to as the type-II dialogue systems in this paper. The distinct feature of such dialogue systems is their tasks of information access from unstructured knowledge sources, or the lack of a well-organized back-end database offering the information for the user. Typical example tasks of this type of dialogue systems include retrieval, browsing and question answering. The mainstream dialogue systems with a well-organized back-end database are then referred to as type-I dialogue systems here in the paper. The functionalities of each module in such type-II dialogue systems are analyzed, presented, and compared with the respective modules in type-I dialogue systems. A preliminary type-II dialogue system recently developed in National Taiwan University is also presented at the end as a typical example.

ieee automatic speech recognition and understanding workshop | 2009

Voice-based information retrieval — how far are we from the text-based information retrieval ?

Lin-Shan Lee; Yi-cheng Pan

Although network content access is primarily text-based today, almost all roles of text can be accomplished by voice. Voice-based information retrieval refers to the situation that the user query and/or the content to be retried are in form of voice. This paper tries to compare the voice-based information retrieval with the currently very successful text-based information retrieval, and identifies two major issues in which voice-based information retrieval is far behind: retrieval accuracy and user-system interaction. These two issues are reviewed, analyzed and discussed in detail. It is found that very good approaches have been proposed and very good improvements have been achieved, although there is still a very long way to go. A few successful prototype systems, among many others are presented at the end.

spoken language technology workshop | 2006

SIMULATION ANALYSIS FOR INTERACTIVE RETRIEVAL OF SPOKEN DOCUMENTS WITH KEY TERMS RANKED BY REINFORCEMENT LEARNING

Yi-cheng Pan; Lin-Shan Lee

Unlike written documents, spoken documents are difficult to display on the screen; it is also difficult for users to browse these documents during retrieval. It has been proposed recently to use interactive multi-modal dialogues to help the user navigate through a spoken document archive to retrieve the desired documents. This interaction is based on a topic hierarchy constructed by the key terms extracted from the retrieved spoken documents. In this paper, the efficiency of the user interaction in such a system is further improved by a key term ranking algorithm using reinforcement learning with simulated users. Extensive simulation analysis was performed, and significant improvements in retrieval efficiency were observed. These improvements show the relative robustness to speech recognition errors.

spoken language technology workshop | 2008

Robustness analysis on lattice-based speech indexing approaches with respect to varying recognition accuracies by refined simulations

Yi-cheng Pan; Hung-lin Chang; Lin-Shan Lee

We analyze the robustness of different lattice-based speech indexing approaches. While we believe such analysis is important, to our knowledge it has been neglected in prior works. In order to make up for the lack of corpora with various noise characteristics, we use refined approaches to simulate feature vector sequences directly from HMMs, including those with a wide range of recognition accuracies, as opposed to simply adding noise and channel distortion to the existing noisy corpora. We compare, analyze, and discuss the robustness of several state-of-the-art speech indexing approaches.

international symposium on chinese spoken language processing | 2006

A multi-layered summarization system for multi-media archives by understanding and structuring of chinese spoken documents

Lin-Shan Lee; Sheng-yi Kong; Yi-cheng Pan; Yi-sheng Fu; Yu-tsun Huang; Chien-chih Wang

The multi-media archives are very difficult to be shown on the screen, and very difficult to retrieve and browse. It is therefore important to develop technologies to summarize the entire archives in the network content to help the user in browsing and retrieval. In a recent paper [1] we proposed a complete set of multi-layered technologies to handle at least some of the above issues: (1) Automatic Generation of Titles and Summaries for each of the spoken documents, such that the spoken documents become much more easier to browse, (2) Global Semantic Structuring of the entire spoken document archive, offering to the user a global picture of the semantic structure of the archive, and (3) Query-based Local Semantic Structuring for the subset of the spoken documents retrieved by the user’s query, providing the user the detailed semantic structure of the relevant spoken documents given the query he entered. The Probabilistic Latent Semantic Analysis (PLSA) is found to be helpful. This paper presents an initial prototype system for Chinese archives with the functions mentioned above, in which the broadcast news archive in Mandarin Chinese is taken as the example archive.

Explore More