Haojin Yang
Hasso Plattner Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Haojin Yang.
IEEE Transactions on Learning Technologies | 2014
Haojin Yang; Christoph Meinel
In the last decade e-lecturing has become more and more popular. The amount of lecture video data on the World Wide Web (WWW) is growing rapidly. Therefore, a more efficient method for video retrieval in WWW or within large lecture video archives is urgently needed. This paper presents an approach for automated video indexing and video search in large lecture video archives. First of all, we apply automatic video segmentation and key-frame detection to offer a visual guideline for the video content navigation. Subsequently, we extract textual metadata by applying video Optical Character Recognition (OCR) technology on key-frames and Automatic Speech Recognition (ASR) on lecture audio tracks. The OCR and ASR transcript as well as detected slide text line types are adopted for keyword extraction, by which both video- and segment-level keywords are extracted for content-based video browsing and search. The performance and the effectiveness of proposed indexing functionalities is proven by evaluation.
Multimedia Tools and Applications | 2014
Haojin Yang; Bernhard Quehl; Harald Sack
Text displayed in a video is an essential part for the high-level semantic information of the video content. Therefore, video text can be used as a valuable source for automated video indexing in digital video libraries. In this paper, we propose a workflow for video text detection and recognition. In the text detection stage, we have developed a fast localization-verification scheme, in which an edge-based multi-scale text detector first identifies potential text candidates with high recall rate. Then, detected candidate text lines are refined by using an image entropy-based filter. Finally, Stroke Width Transform (SWT)- and Support Vector Machine (SVM)-based verification procedures are applied to eliminate the false alarms. For text recognition, we have developed a novel skeleton-based binarization method in order to separate text from complex backgrounds to make it processible for standard OCR (Optical Character Recognition) software. Operability and accuracy of proposed text detection and binarization methods have been evaluated by using publicly available test data sets.
signal-image technology and internet-based systems | 2011
Haojin Yang; Maria Siebert; Patrick Lühne; Harald Sack; Christoph Meinel
The text displayed in a lecture video is closely related to the lecture content. Therefore, it provides a valuable source for indexing and retrieving lecture video contents. Textual content can be detected, extracted and analyzed automatically by video OCR (Optical Character Recognition) techniques. In this paper, we present an approach for automated lecture video indexing based on video OCR technology: Firstly, we developed a novel video segmenter for an automated slide video structure analysis. Having adopted a localization and verification scheme, we perform text detection secondly. We employ SWT (stroke width transform) not only to remove false alarms from the text detection, but also to analyze the slide structure further. To recognize texts, a multi-hypotheses framework is adopted, that consists of multiple text segments, OCR, spell checking and result merging processes. Finally, we implemented a novel algorithm for slide structure analysis and extraction by using the geometrical information of detected text lines. The accuracy of the proposed approach is proven by evaluation.
international symposium on multimedia | 2011
Haojin Yang; Maria Siebert; Patrick Lühne; Harald Sack; Christoph Meinel
During the last years, digital lecture libraries and lecture video portals have become more and more popular. However, finding efficient methods for indexing multimedia still remains a challenging task. Since the text displayed in a lecture video is closely related to the lecture content, it provides a valuable source for indexing and retrieving lecture contents. In this paper, we present an approach for automatic lecture video indexing based on video OCR technology. We have developed a novel video segmenter for automated slide video structure analysis and a weighted DCT (discrete cosines transformation) based text detector. A dynamic image constrast/brightness adaption serves the purpose of enhancing the text image quality to make it processible by existing common OCR software. Time-based text occurence information as well as the analyzed text content are further used for indexing. We prove the accuracy of the proposed approach by evaluation.
Multimedia Tools and Applications | 2016
Cheng Wang; Haojin Yang; Christoph Meinel
Multimodal representation learning has gained increasing importance in various real-world multimedia applications. Most previous approaches focused on exploring inter-modal correlation by learning a common or intermediate space in a conventional way, e.g. Canonical Correlation Analysis (CCA). These works neglected the exploration of fusing multiple modalities at higher semantic level. In this paper, inspired by the success of deep networks in multimedia computing, we propose a novel unified deep neural framework for multimodal representation learning. To capture the high-level semantic correlations across modalities, we adopted deep learning feature as image representation and topic feature as text representation respectively. In joint model learning, a 5-layer neural network is designed and enforced with a supervised pre-training in the first 3 layers for intra-modal regularization. The extensive experiments on benchmark Wikipedia and MIR Flickr 25K datasets show that our approach achieves state-of-the-art results compare to both shallow and deep models in multimodal and cross-modal retrieval.
acm multimedia | 2013
Xiaoyin Che; Haojin Yang; Christoph Meinel
In this paper we propose a solution which segments lecture video by analyzing its supplementary synchronized slides. The slides content derives automatically from OCR (Optical Character Recognition) process with an approximate accuracy of 90%. Then we partition the slides into different subtopics by examining their logical relevance. Since the slides are synchronized with the video stream, the subtopics of the slides indicate exactly the segments of the video. Our evaluation reveals that the average length of segments for each lecture is ranged from 5 to 15 minutes, and 45% segments achieved from test datasets are logically reasonable.
international conference on tools with artificial intelligence | 2015
Cheng Wang; Haojin Yang; Christoph Meinel
Cross-Modal mapping plays an essential role in multimedia information retrieval systems. However, most of existing work paid much attention on learning mapping functions but neglected the exploration of high-level semantic representation of modalities. Inspired by recent success of deep learning, in this paper, deep CNN (convolutional neural networks) features and topic features are utilized as visual and textual semantic representation respectively. To investigate the highly non-linear semantic correlation between image and text, we propose a regularized deep neural network(RE-DNN) for semantic mapping across modalities. By imposing intra-modal regularization as supervised pre-training, we finally learn a joint model which captures both intra-modal and inter-modal relationships. Our approach is superior to previous work in follows: (1) it explores high-level semantic correlations, (2) it requires little prior knowledge for model training, (3) it is able to tackle modality missing problem. Extensive experiments on benchmark Wikipedia dataset show RE-DNN outperforms the state-of-the-art approaches in cross-modal retrieval.
international conference on web based learning | 2012
Haojin Yang; Christoph Oehlke; Christoph Meinel
This paper presents an automated framework for lecture video indexing in the tele-teaching context. The major issues involved in our approach are content-based lecture video analysis and integration of proposed analysis engine into a lecture video portal. In video visual analysis, we apply automated video segmentation, video OCR (Optical Character Recognition) technologies for extracting lecture structural and textual metadata. Concerning ASR (Automated Speech Recognition) analysis, we have optimized the workflow for the creation of a German speech corpus from raw lecture audio data. This enables us to minimize the time and effort required for extending the speech corpus and thus improving the recognition rate. Both, OCR and ASR results have been applied for the further video indexing. In order to integrate the analysis engine into the lecture video portal, we have developed an architecture for the corresponding tasks such as, e.g., data transmission, analysis management, and result visualization etc. The accuracy of each individual analysis method has been evaluated by using publicly available test data sets.
annual acis international conference on computer and information science | 2011
Haojin Yang; Christoph Oehlke; Christoph Meinel
Since recording technology has become more robust and easier to use, more and more universities are taking the opportunity to record their lectures and put them on the Web in order to make them accessable by students. The automatic speech recognition (ASR) techniques provide a valueable source for indexing and retrieval of lecture video materials. In this paper, we evaluate the state-of-the-art speech recognition software to find a solution for the automatic transcription of German lecture videos. Our experimental results show that the word error rates (WERs) was reduced by 12.8% when the speech training corpus of a lecturer is increased by 1.6 hours.
international conference on web based learning | 2013
Haojin Yang; Franka Grünewald; Matthias Bauer; Christoph Meinel
In the last decade e-lecturing has become more and more popular. The amount of lecture video data on the World Wide Web WWW is growing rapidly. Therefore, a more efficient method for video retrieval in WWW or within large lecture video archives is urgently needed. This paper presents an approach for automated video indexing and video search in large lecture video archives. First of all, we apply automatic video segmentation and key-frame detection to offer a visual guideline for the video content navigation. Subsequently, we extract textual metadata by applying video Optical Character Recognition OCR technology on key-frames and by performing Automatic Speech Recognition ASR on lecture audio tracks. The OCR and ASR transcript as well as detected slide text line types are adopted for keyword extraction, by which both video- and segment-level keywords are extracted respectively. Furthermore, we developed a content-based video search function and conducted a user study for evaluating the performance and the effectiveness of proposed indexing methods in our lecture video archive.