Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Justin Jian Zhang is active.

Publication


Featured researches published by Justin Jian Zhang.


north american chapter of the association for computational linguistics | 2007

Speech Summarization Without Lexical Features for Mandarin Broadcast News

Justin Jian Zhang; Pascale Fung

We present the first known empirical study on speech summarization without lexical features for Mandarin broadcast news. We evaluate acoustic, lexical and structural features as predictors of summary sentences. We find that the summarizer yields good performance at the average F-measure of 0.5646 even by using the combination of acoustic and structural features alone, which are independent of lexical features. In addition, we show that structural features are superior to lexical features and our summarizer performs surprisingly well at the average F-measure of 0.3914 by using only acoustic features. These findings enable us to summarize speech without placing a stringent demand on speech recognition accuracy.


ieee automatic speech recognition and understanding workshop | 2007

Improving lecture speech summarization using rhetorical information

Justin Jian Zhang; Ho Yin Chan; Pascale Fung

We propose a novel method of extractive summarization of lecture speech based on unsupervised learning of its rhetorical structure. We present empirical evidence showing that rhetorical structure is the underlying semantics which is then rendered in linguistic and acoustic/prosodic forms in lecture speech. We present a first thorough investigation of the relative contribution of linguistic versus acoustic features and show that, at least for lecture speech, what is said is more important than how it is said. We base our experiments on conference speeches and corresponding presentation slides as the latter is a faithful description of the rhetorical structure of the former. We find that discourse features from broadcast news are not applicable to lecture speech. By using rhetorical structure information in our summarizer, its performance reaches 67.87% ROUGE-L F-measure at 30% compression, surpassing all previously reported results. The performance is also superior to the 66.47% ROUGE-L F-measure of baseline summarization performance without rhetorical information. We also show that, despite a 29.7% character error rate in speech recognition, extractive summarization performs relatively well, underlining the fact that spontaneity in lecture speech does not affect the central meaning of lecture speech.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Extractive Speech Summarization Using Shallow Rhetorical Structure Modeling

Justin Jian Zhang; Ricky Ho Yin Chan; Pascale Fung

We propose an extractive summarization approach with a novel shallow rhetorical structure learning framework for speech summarization. One of the most under-utilized features in extractive summarization is hierarchical structure information-semantically cohesive units that are hidden in spoken documents. We first present empirical evidence that rhetorical structure is the underlying semantic information, which is rendered in linguistic and acoustic/prosodic forms in lecture speech. A segmental summarization method, where the document is partitioned into rhetorical units by K-means clustering, is first proposed to test this hypothesis. We show that this system produces summaries at 67.36% ROUGE-L F-measure, a 4.29% absolute increase in performance compared with that of the baseline system. We then propose Rhetorical-State Hidden Markov Models (RSHMMs) to automatically decode the underlying hierarchical rhetorical structure in speech. Tenfold cross validation experiments are carried out on conference speeches. We show that system based on RSHMMs gives a 71.31% ROUGE-L F-measure, a 8.24% absolute increase in lecture speech summarization performance compared with the baseline system without using RSHMM. Our method equally outperforms the baseline with a conventional discourse feature. We also present a thorough investigation of the relative contribution of different features and show that, for lecture speech, speaker-normalized acoustic features give the most contribution at 68.5% ROUGE-L F-measure, compared to 62.9% ROUGE-L F-measure for linguistic features, and 59.2% ROUGE-L F-measure for un-normalized acoustic features. This shows that the individual speaking style of each speaker is highly relevant to the summarization.


international conference on acoustics, speech, and signal processing | 2008

Rhetorical-State Hidden Markov Models for extractive speech summarization

Pascale Fung; Ricky Ho Yin Chan; Justin Jian Zhang

We propose an extractive summarization system with a novel non-generative probabilistic framework for speech summarization. One of the most underutilized features in extractive summarization is rhetorical information - semantically cohesive units that are hidden in spoken documents. We propose Rhetorical-State Hidden Markov Models (RSHMMs) to automatically decode this underlying structure in speech. We show that RSHMMs give a 71.69% ROUGE-L F-measure, a 5.69% absolute increase in lecture speech summarization performance compared to the baseline system without using RSHMM. It equally outperforms the baseline system with additional discourse features, showing that our RSHMM is a more refined improvement on the conventional discourse feature.


spoken language technology workshop | 2008

RSHMM++ for extractive lecture speech summarization

Justin Jian Zhang; Shilei Huang; Pascale Fung

We propose an enhanced Rhetorical-State Hidden Markov Model (RSHMM++) for extracting hierarchical structural summaries from lecture speech. One of the most underutilized information in extractive summarization is rhetorical structure hidden in speech data. RSHMM++ automatically decodes this underlying information in order to provide better summaries. We show that RSHMM++ gives a 72.01% ROUGE-L F-measure, a 9.78% absolute increase in lecture speech summarization performance compared to the baseline system without using rhetorical information. We also propose Relaxed DTW for compiling reference summaries.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Automatic Parliamentary Meeting Minute Generation Using Rhetorical Structure Modeling

Justin Jian Zhang; Pascale Fung

In this paper, we propose a one step rhetorical structure parsing, chunking and extractive summarization approach to automatically generate meeting minutes from parliamentary speech using acoustic and lexical features. We investigate how to use lexical features extracted from imperfect ASR transcriptions, together with acoustic features extracted from the speech itself, to form extractive summaries with the structure of meeting minutes. Each business item in the minute is modeled as a rhetorical chunk which consists of smaller rhetorical units. Principal Component Analysis (PCA) graphs of both acoustic and lexical features in meeting speech show clear self-clustering of speech utterances according to the underlying rhetorical state-for example acoustic and lexical feature vectors from the question and answer or motion of a parliamentary speech, are grouped together. We then propose a Conditional Random Fields (CRF)-based approach to perform both rhetorical structure modeling and extractive summarization in one step, by chunking, parsing and extraction of salient utterances. Extracted salient utterances are grouped under the labels of each rhetorical state, emulating meeting minutes to yield summaries that are more easily understandable by humans. We compare this approach to different machine learning methods. We show that our proposed CRF-based one step minute generation system obtains the best summarization performance both in terms of ROUGE-L F-measure at 74.5% and by human evaluation, at 77.5% on average.


ACM Transactions on Speech and Language Processing | 2012

Active learning with semi-automatic annotation for extractive speech summarization

Justin Jian Zhang; Pascale Fung

We propose using active learning for extractive speech summarization in order to reduce human effort in generating reference summaries. Active learning chooses a selective set of samples to be labeled. We propose a combination of informativeness and representativeness criteria for selection. We further propose a semi-automatic method to generate reference summaries for presentation speech by using Relaxed Dynamic Time Warping (RDTW) alignment between presentation speech and its accompanied slides. Our summarization results show that the amount of labeled data needed for a given summarization accuracy can be reduced by more than 23% compared to random sampling.


meeting of the association for computational linguistics | 2009

Active Learning of Extractive Reference Summaries for Lecture Speech Summarization

Justin Jian Zhang; Pascale Fung

We propose using active learning for tagging extractive reference summary of lecture speech. The training process of feature-based summarization model usually requires a large amount of training data with high-quality reference summaries. Human production of such summaries is tedious, and since inter-labeler agreement is low, very unreliable. Active learning helps assuage this problem by automatically selecting a small amount of unlabeled documents for humans to hand correct. Our method chooses the unlabeled documents according to the similarity score between the document and the comparable resource---PowerPoint slides. After manual correction, the selected documents are returned to the training pool. Summarization results show an increasing learning curve of ROUGE-L F-measure, from 0.44 to 0.514, consistently higher than that of using randomly chosen training samples.


ieee automatic speech recognition and understanding workshop | 2007

A Mandarin lecture speech transcription system for speech summarization

Ho Yin Chan; Justin Jian Zhang; Pascale Fung; Lu Cao

This paper introduces our work on mandarin lecture speech transcription. In particular, we present our work on a small database, which contains only 16 hours of audio data and 0.16 M words of text data. A range of experiments have been done to improve the performances of the acoustic model and the language model, these include adapting the lecture speech data to the reading speech data for acoustic modeling and the use of lecture conference paper, power points and similar domain web data for language modeling. We also study the effects of automatic segmentation, unsupervised acoustic model adaptation and language model adaptation in our recognition system. By using a 3timesRT multiple passes decoding strategy, we obtain 70.3% accuracy performance in our final system. Finally, we apply our speech transcription system into a SVM summarizer and obtain a ROUGE-L F-measure of 66.5%.


international conference on acoustics, speech, and signal processing | 2010

Learning deep rhetorical structure for extractive speech summarization

Justin Jian Zhang; Pascale Fung

Extractive summarization of conference and lecture speech is useful for online learning and references. We show for the first time that deep(er) rhetorical parsing of conference speech is possible and helpful to extractive summarization task. This type of rhetorical structures is evident in the corresponding presentation slide structures. We propose using Hidden Markov SVM (HMSVM) to iteratively learn the rhetorical structure of the speeches and summarize them. We show that system based on HMSVM gives a 64.3% ROUGE-L F-measure, a 10.1% absolute increase in lecture speech summarization performance compared with the baseline system without rhetorical information. Our method equally outperforms the baseline with a conventional discourse feature. Our proposed approach is more efficient than and also improves upon a previous method of using shallow rhetorical structure parsing [1].

Collaboration


Dive into the Justin Jian Zhang's collaboration.

Top Co-Authors

Avatar

Pascale Fung

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Ricky Ho Yin Chan

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Ho Yin Chan

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Shilei Huang

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Lu Cao

University of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge