Vasileios Chasanis
University of Ioannina
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vasileios Chasanis.
IEEE Transactions on Multimedia | 2009
Vasileios Chasanis; Aristidis Likas; Nikolaos P. Galatsanos
Video indexing requires the efficient segmentation of video into scenes. The video is first segmented into shots and a set of key-frames is extracted for each shot. Typical scene detection algorithms incorporate time distance in a shot similarity metric. In the method we propose, to overcome the difficulty of having prior knowledge of the scene duration, the shots are clustered into groups based only on their visual similarity and a label is assigned to each shot according to the group that it belongs to. Then, a sequence alignment algorithm is applied to detect when the pattern of shot labels changes, providing the final scene segmentation result. In this way shot similarity is computed based only on visual features, while ordering of shots is taken into account during sequence alignment. To cluster the shots into groups we propose an improved spectral clustering method that both estimates the number of clusters and employs the fast global k-means algorithm in the clustering stage after the eigenvector computation of the similarity matrix. The same spectral clustering method is applied to extract the key-frames of each shot and numerical experiments indicate that the content of each shot is efficiently summarized using the method we propose herein. Experiments on TV-series and movies also indicate that the proposed scene detection method accurately detects most of the scene boundaries while preserving a good tradeoff between recall and precision.
Pattern Recognition Letters | 2009
Vasileios Chasanis; Aristidis Likas; Nikolas P. Galatsanos
Video shot detection is an important contemporary problem since it is the first step towards indexing and content based video retrieval. Traditionally, video shot segmentation approaches rely on thresholding methodologies which are sensitive to the content of the video being processed and do not generalize well the when there is little prior knowledge about the video content. To ameliorate this shortcoming we propose a learning based methodology using a set of features that are specifically designed to capture the differences among hard cuts, gradual transitions and normal sequences of frames at the same time. A support vector machine (SVM) classifier is trained both to locate shot boundaries and characterize transition types. Numerical experiments using a variety of videos demonstrate that our method is capable of accurately discriminating shot transitions in videos with different characteristics.
Proceedings of the 2nd ACM TRECVid Video Summarization Workshop on | 2008
Vasileios Chasanis; Aristidis Likas; Nikolas P. Galatsanos
In this paper we describe a system for video rushes summarization. The basic problems of rushes videos are three. First, the presence of useless frames such as colorbars, monochrome frames and frames containing clapboards. Second, the repetition of similar segments produced from multiple takes of the same scene and finally, the efficient representation of the original video in the video summary. In the method we proposed herein, the input video is segmented into shots. Then, colorbars and monochrome frames are removed by checking their edge direction histogram, whereas frames containing clapboards are removed by checking their SIFT descriptors. Next, an enhanced spectral clustering algorithm that both estimates the number of clusters and employs the fast global k-means algorithm in the clustering stage after the eigenvector computation of the similarity matrix is used to extract the key-frames of each shot, to efficiently represent shot content. Similar shots are clustered in one group by comparing their key-frames using a sequence alignment algorithm. Each group is represented from the shot with the largest duration and the final video summary is generated by concatenating frames around the key-frames of each shot. Experiments on TRECVID 2008 Test Data indicate that our method exhibits good performance.
multimedia signal processing | 2007
Vasileios Chasanis; Aristidis Likas; Nikolas P. Galatsanos
Video indexing requires the efficient segmentation of the video into scenes. In the method we propose, the video is first segmented into shots and key-frames are extracted using the global k-means clustering algorithm that represent each shot. Then an improved spectral clustering method is applied to cluster the shots into groups based on visual similarity and a label is assigned to each shot according to the group that it belongs to. Next, a method for segmenting the sequence of shot labels is applied, providing the final scene segmentation result. Numerical experiments indicate that the method we propose correctly detects most of the scene boundaries while preserving a good trade off between recall and precision.
Pattern Recognition Letters | 2016
Antonis Ioannidis; Vasileios Chasanis; Aristidis Likas
We propose an efficient method for key-frame extraction where different image descriptors (views) are combined to capture different aspects of video frames.A weighted multi-view clustering algorithm based on Convex Mixture Models is employed to automatically assign weight to each descriptor.A similarity matrix is built using these weights and is used as input to a spectral clustering algorithm to provide the final partitioning of the frames into groups.To the best of our knowledge, this is the first key-frame extraction method that is capable of combining several image descriptors and estimating the importance of each descriptor. The extraction of representative key-frames from video shots is very important in video processing and analysis, since it constitutes the basis for several important tasks such as video shot summarization, browsing and retrieval as well as high-level video segmentation. The extracted key-frames should capture a great percentage of the information of a shot content, while at the same time they should not present similar visual information. Clustering or segmentation methods are usually employed to extract key-frames. A major difficulty is caused by the large variety in the visual content of videos. Thus, using a single image descriptor (color, texture etc) to extract key-frames is not always effective, since there is no single descriptor surpassing the others in all video cases. To tackle this problem, we propose an approach for the weighted fusion of several descriptors that automatically estimates the weight of each descriptor. The weights reflect the relevance of each descriptor for the specific video shot. Moreover, they are used to form a composite similarity matrix as the weighted sum of all the similarity matrices corresponding to the individual descriptors. This matrix is then used as input to a spectral clustering algorithm that partitions shot frames into groups. Finally the medoid frame of each group is selected as key-frame. Numerical experiments using a variety of videos demonstrate that our method is capable of efficiently summarizing video shots regardless of the characteristics of the visual content of a video.
international conference on pattern recognition | 2014
Antonis Ioannidis; Vasileios Chasanis; Aristidis Likas
Reliable video summarization is one of the most important problems in digital video processing and analysis. The most common approach used for shot representation is the extraction of a set of key-frames sufficiently representing the total content of the shot. In such way, the whole video content can be represented using only a few, cautiously picked, non redundant key-frames maintaining at the same time a great percentage of information. A typical approach is to extract key frames using clustering. However, using a single image descriptor to extract key-frames is not sufficient due to large variations in the visual content of videos. In our approach, a weighted multi-view clustering algorithm is employed to combine two different image descriptors into a single similarity matrix, that serves as an input to a spectral clustering algorithm. Each image descriptor (view) does not contribute equally to the similarity matrix, but the weighted multi-view clustering algorithm associates a weight with each view and learns these weights automatically. Numerical experiments using a variety of videos demonstrate that our method is capable of efficiently summarizing video shots regardless of the characteristics of the visual content of the video.
international conference on artificial neural networks | 2008
Vasileios Chasanis; Aristidis Likas; Nikolas P. Galatsanos
Video summarization is a powerful tool to handle the huge amount of data generated every day. At shot level, the key-frame extraction problem provides sufficient indexing and browsing of large video databases. In this paper we propose an approach that estimates the number of key-frames using elements of the spectral graph theory. Next, the frames of the video sequence are clustered into groups using an improved version of the spectral clustering algorithm. Experimental results show that our algorithm efficiently summarizes the content of a video shot producing unique and representative key-frames outperforming other methods.
international conference on signal processing | 2014
Vasileios Chasanis; Antonis Ioannidis; Aristidis Likas
Keyframe extraction for shot representation is the most common video summarization approach. Any reliable keyframe extraction algorithm should automatically detect the number of keyframes, while extracting non-repetitive keyframes that can efficiently summarize the video content. Moreover, it is important that key-frame extraction is performed in reasonable time. The proposed method is based on a moving window of successive frames that slides over the whole frame sequence (shot). The set of frames included in each window is tested for content homogeneity using an appropriate unimodality test. Thus, each window is characterized as unimodal or not and the frame sequence of each non-unimodal window is splitted into two (possibly unimodal) segments. In this way, each video shot is segmented into unimodal segments and the key-frames are computed as the representative frames (medoids) of each unimodal segment. An important aspect of the above method is that it does not require the number of keyframes to be specified in advance, since the number of segments is computed automatically. Numerical experiments demonstrate that our method provides reasonable estimates of the number of ground-truth keyframes, while extracting non-repetitive keyframes that efficiently summarize the content of each shot.
Archive | 2013
Argyris Kalogeratos; Vasileios Chasanis; G. Rakocevic; Aristidis Likas; Z. Babovic; M. Novakovic
The prerequisite of any machine learning or data mining application is to have a clear target variable that the system will try to learn. In a supervised setting, we also need to know the value of this target variable for a set of training examples (i.e., patient records). In the case study presented in this chapter, the value of the considered target variable that can be used for training is the ground truth characterizations of the coronary artery disease severity or, as a different scenario, the progression of the patients. We either set as target variable the disease severity, or disease progression, and then we consider a two-class problem in which we aim to discriminate a group of patients that are characterized as “severely diseased” or “severely progressed,” from a second group containing “mildly diseased” or “mildly progressed” patients, respectively. This latter mild/severe characterization is the actual value of the target variable for each patient.
New Directions in Intelligent Interactive Multimedia | 2008
Vasileios Chasanis; Aristidis Likas; Nikolas P. Galatsanos
The first step towards indexing and content based video retrieval is video shot detection. Existing methodologies for video shot detection are mostly threshold dependent. No prior knowledge about the video content makes such methods sensitive to video content. To ameliorate this shortcoming we propose a learning based methodology using a set of features that are specifically designed to capture the differences among hard cuts, gradual transitions and normal sequences of frames simultaneously. A Support Vector Machine (SVM) classifier is trained both to locate shot boundaries and characterize transition types. Numerical experiments using a variety of videos demonstrate that our method is capable of accurately detecting and discriminating shot transitions in videos with different characteristics.