Sujeet Shyamsundar Mate

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sujeet Shyamsundar Mate is active.

Explore More

Publication

Featured researches published by Sujeet Shyamsundar Mate.

mobile and ubiquitous multimedia | 2006

Movable-multimedia: session mobility in ubiquitous computing ecosystem

Sujeet Shyamsundar Mate; Umesh Chandra; Igor Danilo Diego Curcio

IP-based Multimedia creation and consumption is becoming available on an increasing spectrum of devices ranging from low-powered portable devices like cell phones and PDAs to high powered static devices like desktops PCs and IPTVs. High-speed wireless and wire-line network access is becoming widespread. Multimedia services like IPTV, Video-on-demand and Video-conferencing are becoming main stream. These developments have paved the way for creating innovative concepts like session mobility for multimedia applications. Session mobility enables seamless transfer of an on-going multimedia session between different devices based on user preferences. We examine the motivations and use cases for session mobility. In this paper we evaluate architectural design constraints and propose architectural solutions for session mobility. We evaluate current session mobility mechanisms and propose requirements for new mechanisms.

human factors in computing systems | 2011

We want more: human-computer collaboration in mobile social video remixing of music concerts

Sami Vihavainen; Sujeet Shyamsundar Mate; Lassi Seppälä; Francesco Cricri; Igor Danilo Diego Curcio

Recording and publishing mobile video clips from music concerts is popular. There is a high potential to increase the concerts perceived value when producing video remixes from individual video clips and using them socially. A digital production of a video remix is an interactive process between human and computer. However, it is not clear what the collaboration implications between human and computer are. We present a case study where we compare the processes and products of manual and automatic mobile video remixing. We provide results from the first systematic real world study of the subject. We draw our observations from a user trial where fans recorded mobile video clips during a rock concert. The results reveal issues on heterogeneous interests of the stakeholders, unexpected uses of the raw material, the burden of editing, diverse quality requirements, motivations for remixing, the effect of understanding the logic of automation, and the collaborative use of manual and automatic remixing.

IEEE Communications Magazine | 2009

Mobile and interactive social television

Sujeet Shyamsundar Mate; Igor Danilo Diego Curcio

Services that were traditionally designed for a static environment can now be implemented into mobile devices. At the same time, services with traditionally passive-consumption-oriented paradigms are moving toward participative and interactive services. One such service is mobile and interactive social TV (MIST), which allows geographically dispersed people to meet in a virtual shared space and watch TV while being able to interact with each other. This service allows users to create an experience of watching together by providing its participants a common shared context. We present two novel architectures of a MIST system. In both of the architectures, the interaction is represented by rich audio-visual media, allowing users to hear and see each other. In the first architecture, the mixing of the TV content with the interaction media is performed at the server side. In the second architecture, the mixing is performed in each client device. There are many questions that arise from the consumer perspective regarding a radical change in experience when compared to traditional laid-back TV watching. Mobile and interactive social TV is relatively new when compared to the concept of traditional TV watching in a static context. To develop understanding of the consumer experience with the MIST concept, a focus group study approach was conducted. The study revealed that the feeling of social presence of people of interest when watching content with them was considered to add value to the viewing experience. The key system requirement is the ability for selective enabling/disabling of individual interaction features as per the user preferences and context. The context was considered to be influenced by both the relation with other participants and the content being consumed.

Multimedia Tools and Applications | 2014

Multimodal extraction of events and of information about the recording activity in user generated videos

Francesco Cricri; Kostadin Dabov; Igor Danilo Diego Curcio; Sujeet Shyamsundar Mate; Moncef Gabbouj

In this work we propose methods that exploit context sensor data modalities for the task of detecting interesting events and extracting high-level contextual information about the recording activity in user generated videos. Indeed, most camera-enabled electronic devices contain various auxiliary sensors such as accelerometers, compasses, GPS receivers, etc. Data captured by these sensors during the media acquisition have already been used to limit camera degradations such as shake and also to provide some basic tagging information such as the location. However, exploiting the sensor-recordings modality for subsequent higher-level information extraction such as interesting events has been a subject of rather limited research, further constrained to specialized acquisition setups. In this work, we show how these sensor modalities allow inferring information (camera movements, content degradations) about each individual video recording. In addition, we consider a multi-camera scenario, where multiple user generated recordings of a common scene (e.g., music concerts) are available. For this kind of scenarios we jointly analyze these multiple video recordings and their associated sensor modalities in order to extract higher-level semantics of the recorded media: based on the orientation of cameras we identify the region of interest of the recorded scene, by exploiting correlation in the motion of different cameras we detect generic interesting events and estimate their relative position. Furthermore, by analyzing also the audio content captured by multiple users we detect more specific interesting events. We show that the proposed multimodal analysis methods perform well on various recordings obtained in real live music performances.

conference on multimedia modeling | 2012

Sensor-based analysis of user generated video for multi-camera video remixing

Francesco Cricri; Igor Danilo Diego Curcio; Sujeet Shyamsundar Mate; Kostadin Dabov; Moncef Gabbouj

In this work we propose to exploit context sensor data for analyzing user generated videos. Firstly, we perform a low-level indexing of the recorded media with the instantaneous compass orientations of the recording device. Subsequently, we exploit the low level indexing to obtain a higher level indexing for discovering camera panning movements, classifying them, and for identifying the Region of Interest (ROI) of the recorded event. Thus, we extract information about the content without performing content analysis but by leveraging sensor data analysis. Furthermore, we develop an automatic remixing system that exploits the obtained high-level indexing for producing a video remix. We show that the proposed sensor-based analysis can correctly detect and classify camera panning and identify the ROI; in addition, we provide examples of their application to automatic video remixing.

international symposium on multimedia | 2011

Multimodal Event Detection in User Generated Videos

Francesco Cricri; Kostadin Dabov; Igor Danilo Diego Curcio; Sujeet Shyamsundar Mate; Moncef Gabbouj

Nowadays most camera-enabled electronic devices contain various auxiliary sensors such as accelerometers, gyroscopes, compasses, GPS receivers, etc. These sensors are often used during the media acquisition to limit camera degradations such as shake and also to provide some basic tagging information such as the location used in geo-tagging. Surprisingly, exploiting the sensor-recordings modality for high-level event detection has been a subject of rather limited research, further constrained to highly specialized acquisition setups. In this work, we show how these sensor modalities, alone or in combination with content-based analysis, allow inferring information about the video content. In addition, we consider a multi-camera scenario, where multiple user generated recordings of a common scene (e.g., music concerts, public events) are available. In order to understand some higher-level semantics of the recorded media, we jointly analyze the individual video recordings and sensor measurements of the multiple users. The detected semantics include generic interesting events and some more specific events. The detection exploits correlations in the camera motion and in the audio content of multiple users. We show that the proposed multimodal analysis methods perform well on various recordings obtained in real live music performances.

IEEE Transactions on Multimedia | 2014

Sport Type Classification of Mobile Videos

Francesco Cricri; Mikko Roininen; Jussi Leppänen; Sujeet Shyamsundar Mate; Igor Danilo Diego Curcio; Stefan Uhlmann; Moncef Gabbouj

The recent proliferation of mobile video content has emphasized the need for applications such as automatic organization and automatic editing of videos. These applications could greatly benefit from domain knowledge about the content. However, extracting semantic information from mobile videos is a challenging task, due to their unconstrained nature. We extract domain knowledge about sport events recorded by multiple users, by classifying the sport type into soccer, American football, basketball, tennis, ice-hockey, or volleyball. We adopt a multi-user and multimodal approach, where each user simultaneously captures audio-visual content and auxiliary sensor data (from magnetometers and accelerometers). Firstly, each modality is separately analyzed; then, analysis results are fused for obtaining the sport type. The auxiliary sensor data is used for extracting more discriminative spatio-temporal visual features and efficient camera motion features. The contribution of each modality to the fusion process is adapted according to the quality of the input data. We performed extensive experiments on data collected at public sport events, showing the merits of using different combinations of modalities and fusion methods. The results indicate that analyzing multimodal and multi-user data, coupled with adaptive fusion, improves classification accuracies in most tested cases, up to 95.45%.

advances in multimedia | 2012

Multimodal semantics extraction from user-generated videos

Francesco Cricri; Kostadin Dabov; Mikko Roininen; Sujeet Shyamsundar Mate; Igor Danilo Diego Curcio; Moncef Gabbouj

User-generated video content has grown tremendously fast to the point of outpacing professional content creation. In this work we develop methods that analyze contextual information of multiple user-generated videos in order to obtain semantic information about public happenings (e.g., sport and live music events) being recorded in these videos. One of the key contributions of this work is a joint utilization of different data modalities, including such captured by auxiliary sensors during the video recording performed by each user. In particular, we analyze GPS data, magnetometer data, accelerometer data, video- and audio-content data. We use these data modalities to infer information about the event being recorded, in terms of layout (e.g., stadium), genre, indoor versus outdoor scene, and the main area of interest of the event. Furthermore we propose a method that automatically identifies the optimal set of cameras to be used in a multicamera video production. Finally, we detect the camera users which fall within the field of view of other cameras recording at the same public happening. We show that the proposed multimodal analysis methods perform well on various recordings obtained in real sport events and live music performances.

world of wireless mobile and multimedia networks | 2009

Mobile and Interactive Social Television — A virtual TV room

Francesco Cricri; Sujeet Shyamsundar Mate; Igor Danilo Diego Curcio; Moncef Gabbouj

Smart phones are becoming more and more powerful. Services that were traditionally designed for a static environment can now be implemented into mobile devices. One such service is Interactive Social TV, which allows geographically dispersed people to meet in a virtual shared space and watch TV while being able to interact with each other. This paper presents two novel architectures of a Mobile and Interactive Social TV system. In both of the architectures, the interaction is represented by a rich audio-visual media, allowing users to hear and see each other. In the first architecture, the mixing of the TV content with the interaction media is performed at the server side. In the second architecture, the mixing is performed in each client device. The issues of decoding and rendering simultaneous content and interaction media streams on a mobile device are discussed, and the related implementation is presented.

international symposium on multimedia | 2014

Salient Event Detection in Basketball Mobile Videos

Francesco Cricri; Sujeet Shyamsundar Mate; Igor Danilo Diego Curcio; Moncef Gabbouj

Modern smartphones have become the most popular means for recording videos. In fact, thanks to their portability, smartphones allow for recording anything and at any moment of our everyday life. One common occasion is represented by sport happenings, where people often record their favourite team or players. Automatic analysis of such videos is important for enabling applications such as automatic organization, browsing and summarization of the content. This paper proposes novel algorithms for the detection of salient events in videos recorded at basketball games. The novel approach consists of jointly analyzing visual data and magnetometer data. The magnetometer data provides information about the horizontal orientation of the camera. The proposed joint analysis allows for a reduced number of false positives and for a reduced computational complexity. The algorithms are tested on data captured during real basketball games. The experimental results clearly show the advantages of the proposed approach.

Explore More