Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ionut Mironica is active.

Publication


Featured researches published by Ionut Mironica.


international conference on multimedia retrieval | 2013

A naive mid-level concept-based fusion approach to violence detection in Hollywood movies

Bogdan Ionescu; Jan Schlüter; Ionut Mironica; Markus Schedl

In this paper we approach the issue of violence detection in typical Hollywood productions. Given the high variability in appearance of violent scenes in movies, training a classifier to predict violent frames directly from visual or/and auditory features seems rather difficult. Instead, we propose a different perspective that relies on fusing mid-level concept predictions that are inferred from low-level features. This is achieved by employing a bank of multi-layer perceptron classifiers featuring a dropout training scheme. Experimental validation conducted in the context of the Violent Scenes Detection task of the MediaEval 2012 Multimedia Benchmark Evaluation show the potential of this approach that ranked first among 34 other submissions in terms of precision and F1-score.


acm multimedia | 2013

Time matters!: capturing variation in time in video using fisher kernels

Ionut Mironica; Jasper R. R. Uijlings; Negar Rostamzadeh; Bogdan Ionescu; Nicu Sebe

In video global features are often used for reasons of computational efficiency, where each global feature captures information of a single video frame. But frames in video change over time, so an important question is: how can we meaningfully aggregate frame-based features in order to preserve the variation in time? In this paper we propose to use the Fisher Kernel to capture variation in time in video. While in this approach the temporal order is lost, it captures both subtle variation in time such as the ones caused by a moving bicycle and drastic variations in time such as the changing of shots in a documentary. Our work should not be confused with a Bag of Local Visual Features approach, where one captures the visual variation of local features in both time and space indiscriminately. Instead, each feature measures a complete frame hence we capture variation in time only. We show that our framework is highly general, reporting improvements using frame-based visual features, body-part features, and audio features on three diverse datasets: We obtain state-of-the-art results on the UCF50 human action dataset and improve the state-of-the-art on the MediaEval 2012 video-genre benchmark and on the ADL daily activity recognition dataset.


content based multimedia indexing | 2015

VSD2014: A dataset for violent scenes detection in hollywood movies and web videos

Markus Schedi; Mats Sjöberg; Ionut Mironica; Bogdan Ionescu; Vu Lam Quang; Yu-Gang Jiang; Claire-Hélène Demarty

In this paper, we introduce a violent scenes and violence-related concept detection dataset named VSD2014. It contains annotations as well as auditory and visual features of Hollywood movies and user-generated footage shared on the web. The dataset is the result of a joint annotation endeavor of different research institutions and responds to the real-world use case of parental guidance in selecting appropriate content for children. The dataset has been validated during the Violent Scenes Detection (VSD) task at the MediaEval benchmarking initiative for multimedia evaluation.


international conference on multimedia retrieval | 2013

Fisher kernel based relevance feedback for multimodal video retrieval

Ionut Mironica; Bogdan Ionescu; Jasper R. R. Uijlings; Nicu Sebe

This paper proposes a novel approach to relevance feedback based on the Fisher Kernel representation in the context of multimodal video retrieval. The Fisher Kernel representation describes a set of features as the derivative with respect to the log-likelihood of the generative probability distribution that models the feature distribution. In the context of relevance feedback, instead of learning the generative probability distribution over all features of the data, we learn it only over the top retrieved results. Hence during relevance feedback we create a new Fisher Kernel representation based on the most relevant examples. In addition, we propose to use the Fisher Kernel to capture temporal information by cutting up a video in smaller segments, extract a feature vector from each segment, and represent the resulting feature set using the Fisher Kernel representation. We evaluate our method on the MediaEval 2012 Video Genre Tagging Task, a large dataset, which contains 26 categories in 15.000 videos totalling up to 2.000 hours of footage. Results show that our method significantly improves results over existing state-of-the-art relevance feedback techniques. Furthermore, we show significant improvements by using the Fisher Kernel to capture temporal information, and we demonstrate that Fisher kernels are well suited for this task.


content based multimedia indexing | 2013

An in-depth evaluation of multimodal video genre categorization

Ionut Mironica; Bogdan Ionescu; Peter Knees; Patrick Lambert

In this paper we propose an in-depth evaluation of the performance of video descriptors to multimodal video genre categorization. We discuss the perspective of designing appropriate late fusion techniques that would enable to attain very high categorization accuracy, close to the one achieved with user-based text information. Evaluation is carried out in the context of the 2012 Video Genre Tagging Task of the MediaEval Benchmarking Initiative for Multimedia Evaluation, using a data set of up to 15.000 videos (3,200 hours of footage) and 26 video genre categories specific to web media. Results show that the proposed approach significantly improves genre categorization performance, outperforming other existing approaches. The main contribution of this paper is in the experimental part, several valuable interesting findings are reported that motivate further research on video genre classification.


international conference on intelligent computer communication and processing | 2014

Multiple instance-based object retrieval in video surveillance: Dataset and evaluation

Catalin Alexandru Mitrea; Ionut Mironica; Bogdan Ionescu; Radu Dogaru

In this paper we propose a classification-based automated surveillance system for multiple-instance object retrieval task, and its main purpose, to track of a list of persons in several video sources, using only few training frames. We discuss the perspective of designing appropriate motion detectors, feature extraction and classification techniques that would enable to attain high categorization accuracy, and low percentage of false negatives. Evaluation is carried out on a new proposed dataset, namely Scouter dataset, which contains approximately 36,000 annotated frames. The proposed dataset contains 10 video sources, with variable lighting conditions and different levels of difficulty. The video database raises several challenges such as noise, low quality image or blurring, increasing the difficulty of its analysis. Also, the contribution of this paper is in the experimental part, several valuable interesting findings are reported that motivate further research on automated surveillance algorithms. The combination and calibration of appropriate motion detectors, feature extractors and classifiers allows to obtain high recall performance.


international conference on image analysis and processing | 2013

Daily Living Activities Recognition via Efficient High and Low Level Cues Combination and Fisher Kernel Representation

Negar Rostamzadeh; Gloria Zen; Ionut Mironica; Jasper R. R. Uijlings; Nicu Sebe

In this work we propose an efficient method for activity recognition in a daily living scenario. At feature level, we propose a method to extract and combine low- and high-level information and we show that the performance of body pose estimation (and consequently of activity recognition) can be significantly improved. Particularly, we propose an approach extending the pictorial deformable models for the body pose estimation from the state-of-the-art. We show that including low level cues (e.g. optical flow and foreground) together with an off-the-shelf body part detector allows reaching better performance without the need to re-train the detectors. Finally, we apply the Fisher Kernel representation that takes the temporal variation into account and we show that we outperform state-of-the-art methods on a public dataset with daily living activities.


international symposium on signals, circuits and systems | 2013

Background invariant static hand gesture recognition based on Hidden Markov Models

Radu-Laurentiu Vieriu; Ionut Mironica; Bogdan-Tudor Goras

This paper addresses the problem of Static Hand Gesture Recognition (SHGR) and proposes a fast yet simple solution based on Discrete Hidden Markov Models (DHMMs) that use features extracted from the hand contours. In addition to previous work, the use of depth information ensures robustness to the overall system, making it background invariant. Experiments carried on a challenging noisy dataset reveal the superior discriminating as well as generalizing abilities of statistical models, when compared to state-of-the-art methods.


international symposium on signals, circuits and systems | 2015

Fast Support Vector Classifier for automated content-based search in video surveillance

Catalin Alexandru Mitrea; Ionut Mironica; Bogdan Ionescu; Radu Dogaru

In this article we present and test a specialized classifier, i.e., Fast Support Vector Classifier (FSVC), which is employed for multiple-instance human retrieval in video surveillance. Thanks to its low complexity and high performance in terms of computation and speed, FSVC is adapted to ease the generalization of the feature space using only a limited number of samples in the training process. To validate the performance, FSVC is evaluated on two standard video surveillance datasets. It obtains superior or similar results in terms of F2-Score compared to the close related state-of-the-art Support Vector Machines approaches.


international conference on multimedia and expo | 2015

Beyond Bag-of-Words: Fast video classification with Fisher Kernel Vector of Locally Aggregated Descriptors

Ionut Mironica; Ionut C. Duta; Bogdan Ionescu; Nicu Sebe

In this paper we introduce a new video description framework that replaces traditional Bag-of-Words with a combination of Fisher Kernels (FK) and Vector of Locally Aggregated Descriptors (VLAD). The main contributions are: (i) a fast algorithm to densely extract global frame features, easier and faster to compute than spatio-temporal local features; (ii) replacing the traditional k-means based vocabulary with a Random Forest approach that allows significant speedup; (iii) use of a modified VLAD and FK representation to replace the classic Bag-of-Words and obtaining better performance. We show that our framework is highly general and is not dependent on a particular type of descriptor. It achieves state-of-the-art results in several classification scenarios.

Collaboration


Dive into the Ionut Mironica's collaboration.

Top Co-Authors

Avatar

Bogdan Ionescu

Politehnica University of Bucharest

View shared research outputs
Top Co-Authors

Avatar

Constantin Vertan

Politehnica University of Bucharest

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Markus Schedl

Johannes Kepler University of Linz

View shared research outputs
Top Co-Authors

Avatar

Catalin Alexandru Mitrea

Politehnica University of Bucharest

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bogdan Boteanu

Politehnica University of Bucharest

View shared research outputs
Top Co-Authors

Avatar

Radu Dogaru

Politehnica University of Bucharest

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge