Is this you? Create Your Porfile

Guoyun Lv

Northwestern Polytechnical University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guoyun Lv is active.

Explore More

Publication

Featured researches published by Guoyun Lv.

international symposium on multimedia | 2007

Multi-stream Asynchrony Modeling for Audio-Visual Speech Recognition

Guoyun Lv; Dongmei Jiang; Rongchun Zhao; Yunshu Hou

In this paper, two multi-stream asynchrony Dynamic Bayesian Network models (MS-ADBN model and MM-ADBN model) are proposed for audio-visual speech recognition (AVSR). The proposed models, with different topology structures, loose the asynchrony of audio and visual streams to word level. For MS-ADBN model, both in audio stream and in visual stream, each word is composed of its corresponding phones, and each phone is associated with observation vector. MM- ADBN model is an augmentation of MS-ADBN model, a level of hidden nodes--state level, is added between the phone level and the observation node level, to describe the dynamic process of phones. Essentially, MS-ADBN model is a word model, while MM-ADBN model is a phone model. Speech recognition experiments are done on a digit audio-visual (A-V) database, as well as on a continuous A-V database. The results demonstrate that the asynchrony description between audio and visual stream is important for AVSR system, and MM-ADBN model has the best performance for the task of continuous A-V speech recognition.

international conference on communications | 2007

A Robust Visual Feature Extraction based BTSM-LDA for Audio-Visual Speech Recognition

Guoyun Lv; Rongchun Zhao; Dongmei Jiang; Yan Li; Hichem Sahli

The asynchrony for speech and lip movement is key problem of audio-visual speech recognition (AVSR) system. A multi-stream asynchrony dynamic Bayesian network (MS-ADBN) model is proposed for audio-visual speech recognition. Comparing with multi-stream HMM (MSHMM), MS-ADBN model describes the asynchrony of audio stream and visual stream to the word level. Simultaneously, based on profile of lip implemented by using Bayesian tangent shape model (BTSM), linear discrimination analysis (LDA) is used for visual feature extraction which describes the dynamic feature of lip and removes the redundancy of lip geometrical feature. The experiments results on continuous digit audio-visual database show that lip dynamic feature based on BTSM and LDA is more stable and robust than direct lip geometrical feature. In the noisy environments with signal to noise ratios ranging from 0 dB to 30 dB, comparing with MSHMM, MS-ADBN model with MFCC and LDA visual features has an average improvement of 4.92% in speech recognition rate.

Multimedia Tools and Applications | 2011

Fractal and neural networks based watermark identification

Li Mao; Yangyu Fan; Hui-qin Wang; Guoyun Lv

Transform techniques generally are more robust than spatial techniques for watermark embedding. In this paper, a color image watermarking algorithm based on fractal and neural networks in Discrete Cosine Transform (DCT) domain is proposed. We apply fractal image coding technique to obtain the characteristic data of a gray-level image watermark signal and encrypt the characteristic data by a symmetric encryption before they are embedded. We then use neural networks and Human Visual System (HVS) to embed the watermark in the DCT domain. A Just Noticeable Difference (JND) threshold controller is designed to ensure the strength of the embedded data adapting to the host image itself entirely. Aiming at misjudging problem of the extracting process, maximum membership principle criterion is selected for identifying the watermark. And the CIELab color space is chosen to guarantee the stability of the results. The simulation results show that the algorithm is robust for common digital image processing methods as attacks and that the quality of the image is retained.

international congress on image and signal processing | 2009

Adaptive Color Image Watermarking Algorithm Based on Fractal and Neural Networks

Li Mao; Yangyu Fan; Guoyun Lv; Hui-qin Wang

In this paper, a color image watermarking algorithm based on fractal and neural networks in Discrete Cosine Transform (DCT) domain is proposed. Firstly, the algorithm utilizes the fractal image coding technique to obtain the characteristic data of a gray-level image watermark signal and encrypts it by a symmetric encryption algorithm before it was embedded. Secondly, by exploiting the abilities of neural networks and considering the characteristics of Human Visual System (HVS), a Just Noticeable Difference (JND) threshold controller is designed to ensure the strength of the embedded data adapting to the host image itself entirely. Thus the watermark scheme possesses the dual security characteristics. To improve the robustness of the algorithm, the simple and repetition of the results is applied to watermark configuration. And the CIELab color space is chosen to guarantee the stability of the results. Experimental results show that the proposed algorithm is invisible and robust against commonly used image- processing methods. Keywords-Fractal; Neural Networks; HVS; DCT; color image watermarking

Circuits Systems and Signal Processing | 2015

A Pseudo-Natural Sampling Algorithm for Low-Cost Low-Distortion Asymmetric Double-Edge PWM Modulators

Zeqi Yu; Yangyu Fan; Longfei Shi; Guoyun Lv

In this paper, a pseudo-natural sampling algorithm for correcting the harmonic distortion produced by asymmetric double-edge uniform-sampling pulse-width modulation is proposed. The algorithm uses the decomposability of the asymmetric double-edge pulse-width modulation process and the Lagrange numerical differentiation method for calculating the pseudo-natural sampling points to obtain a harmonic distortion correction effect. The computational complexity of the algorithm is low because it requires only three shifts, six additions and three multiplications to calculate each of the pseudo-natural sampling points. A complete experimental system based on a single field programmable gate array was built to verify the effectiveness of the proposed algorithm and to compare it with other reported kindred algorithms. The results obtained show that the proposed algorithm has lower hardware requirements and better harmonic distortion correction than other related algorithms.

Multimedia Tools and Applications | 2016

Line detection algorithm based on adaptive gradient threshold and weighted mean shift

Yi Wang; Liangliang Yu; Houqi Xie; Tao Lei; Zhe Guo; Min Qi; Guoyun Lv; Yangyu Fan; Yilong Niu

Line detection is a classical problem in computer vision and image processing, and it is widely used as a basic method. Most of existing line detection algorithms are based on edge information, whose discontinuity limited the detection result. Meanwhile, some other algorithms only use gradient magnitudes, and neglect the function of gradient directions. In this paper, an adaptive gradient threshold and omni-direction line growing method based on line detection with weighted mean shift procedure and 2D slice sampling strategy (referred to as LSWMSAllDir) is proposed. It makes full use of the magnitudes and directions of the gradient to detect lines in the image. Experiments on synthetic data and real scene image data showed that the improve algorithm was the most accurate when compared with Progressive Probabilistic Hough Transform (PPHT), line segment detector (LSD), parameter free edge drawing (EDPF) and original line segment detection using weighted mean shift (LSWMS) algorithms.

international congress on image and signal processing | 2015

A novel low bit rate speech secure terminal system based on PSTN line

Tiantian Ma; Guoyun Lv; Junsheng Li; Yabin Zhao

In this paper, a novel low bit rate speech secure terminal system is designed to meet the strict requirement of very low bit error rate in data communication based on Public Switched Telephone Network (PSTN) line. This system uses the ARM+FPGA architecture, its main purpose is to make a balance between voice quality and coding efficiency is to obtain lower bit error rate and guarantee the quality of speech. In this system, secure mode adopts the AES 128bit algorithm, and audio encode uses WT2000 chip which be used for aviation voice communication, and digital modulation uses CX93011. Finally, experiment results under different complex conditions show that the system is very static and safe and clear voice can be obtained, and the bit error rate of encrypted communication under in most complex PSTN with ADSL is only 5.7*10-6.

international conference on signal processing | 2014

Single image dehazing based on multiple scattering model

Xipan Lu; Guoyun Lv; Tao Lei

Most of the existing hazing algorithms are established on the assumptions of single scattering, but the multiple scattering is obvious and cannot be ignored. To obtain a better dehazing effect, we first analyze the multiple atmospheric scattering in detail and propose an image degradation model based on multiple scattering. Second, we propose a new improved method on the basis of dark channel prior. The proposed algorithm uses the semi-reverse algorithm to determine the foggy area, and then estimates the atmospheric light A from the most concentrated area; and then estimates transmission with the minimum channel of RGB; finally recover a clear image according to the multiple scattering model. Experimental results demonstrate that the proposed method achieves good restoration for contrast and color fidelity and has low computational complexity.

international conference on audio, language and image processing | 2014

Speech emotion recognition based on dynamic models

Guoyun Lv; Shuixian Hu; Xipan Lu

This paper introduced the semi-continuous Hidden Markov Model (HMM) and proposed a novel Dynamic Bayesian Network (DBN) model for dynamic speech emotion recognition. The former reduces the training complexity caused by mixture Gaussians by sharing the Condition Probability Densities (CPDs) of Gaussians among the states, and the latter adds a sub-state layer between state and observation layer based on traditional DBN framework and describes the dynamic process of speech emotion in detail. Experiments results show that average emotion recognition rate of semi-continuous HMM is 4% and 10% higher than those of classical HMM and Mixture Gaussian HMM respectively, and average emotion recognition rate of the three-layer DBN model is 11% and 8% higher than those of traditional DBN model and semi-continuous HMM.

international conference on audio, language and image processing | 2014

Fast single image dehazing algorithm

Xipan Lu; Guoyun Lv; Tao Lei

Image captured in foggy weather conditions often suffer from poor visibility. In this paper, we proposed an improved method of dark channel prior. Using the semi-inverse algorithm, the proposed algorithm can accurately identify the foggy area and effectively obtain the global atmospheric light. Then the transmission is obtained according to the minimum channel of RGB. Finally the scene albedo is recovered by inverting the atmospheric scattering model. The main advantage of the proposed algorithm compared with others is its higher speed. This speed allows the algorithm to be applied in real-time processing applications. A comparative experiment with several existing state-of-the-art algorithms shows that the proposed method achieves good restoration for contrast and color fidelity.

Explore More