Hrishikesh Aradhye | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hrishikesh Aradhye is active.

Explore More

Publication

Featured researches published by Hrishikesh Aradhye.

computer vision and pattern recognition | 2010

Finding meaning on YouTube: Tag recommendation and category discovery

George Toderici; Hrishikesh Aradhye; Marius Pasca; Luciano Sbaiz; Jay Yagnik

We present a system that automatically recommends tags for YouTube videos solely based on their audiovisual content. We also propose a novel framework for unsupervised discovery of video categories that exploits knowledge mined from the World-Wide Web text documents/searches. First, video content to tag association is learned by training classifiers that map audiovisual content-based features from millions of videos on YouTube.com to existing uploader-supplied tags for these videos. When a new video is uploaded, the labels provided by these classifiers are used to automatically suggest tags deemed relevant to the video. Our system has learned a vocabulary of over 20,000 tags. Secondly, we mined large volumes of Web pages and search queries to discover a set of possible text entity categories and a set of associated is-A relationships that map individual text entities to categories. Finally, we apply these is-A relationships mined from web text on the tags learned from audiovisual content of videos to automatically synthesize a reliable set of categories most relevant to videos – along with a mechanism to predict these categories for new uploads. We then present rigorous rating studies that establish that: (a) the average relevance of tags automatically recommended by our system matches the average relevance of the uploader-supplied tags at the same or better coverage and (b) the average precision@K of video categories discovered by our system is 70% with K=5.

international conference on data mining | 2009

Video2Text: Learning to Annotate Video Content

Hrishikesh Aradhye; George Toderici; Jay Yagnik

This paper discusses a new method for automatic discovery and organization of descriptive concepts (labels) within large real-world corpora of user-uploaded multimedia, such as YouTube. com. Conversely, it also provides validation of existing labels, if any. While training, our method does not assume any explicit manual annotation other than the weak labels already available in the form of video title, description, and tags. Prior work related to such auto-annotation assumed that a vocabulary of labels of interest (e. g., indoor, outdoor, city, landscape) is specified a priori. In contrast, the proposed method begins with an empty vocabulary. It analyzes audiovisual features of 25 million YouTube. com videos -- nearly 150 years of video data -- effectively searching for consistent correlation between these features and text metadata. It autonomously extends the label vocabulary as and when it discovers concepts it can reliably identify, eventually leading to a vocabulary with thousands of labels and growing. We believe that this work significantly extends the state of the art in multimedia data mining, discovery, and organization based on the technical merit of the proposed ideas as well as the enormous scale of the mining exercise in a very challenging, unconstrained, noisy domain.

international conference on acoustics, speech, and signal processing | 2009

Audiovisual celebrity recognition in unconstrained web videos

Mehmet Emre Sargin; Hrishikesh Aradhye; Pedro J. Moreno; Ming Zhao

The number of video clips available online is growing at a tremendous pace. Conventionally, user-supplied metadata text, such as the title of the video and a set of keywords, has been the only source of indexing information for user-uploaded videos. Automated extraction of video content for unconstrained and large scale video databases is a challenging and yet unsolved problem. In this paper, we present an audiovisual celebrity recognition system towards automatic tagging of unconstrained web videos. Prior work on audiovisual person recognition relied on the fact that the person in the video is speaking and the features extracted from audio and visual domain are associated with each other throughout the video. However, this assumption is not valid on unconstrained web videos. Proposed method finds the audiovisual mapping and hence improve upon the association assumption. Considering the scale of the application, all pieces of the system are trained automatically without any human supervision. We present the results on 26,000 videos and show the effectiveness of the method per-celebrity basis.

international conference on acoustics, speech, and signal processing | 2011

Boosting video classification using cross-video signals

Mehmet Emre Sargin; Hrishikesh Aradhye

We consider the problem of large-scale video classification. Our attention is focused on online video services since they can provide rich cross-video signals derived from user behavior. These signals help us to extract correlated information across videos which are co-browsed, co-uploaded, cocommented, co-queried, etc. Majority of the video classification methods omit this rich information and focus solely on a single test instance. In this paper, we propose a video classification system that exploits various cross-video signals offered by large-scale video databases. In our experiments, we show up to 4.5% absolute equal error rate (17% relative) improvement over the baseline on four video classification problems.

international conference on data mining | 2011

From Videos to Places: Geolocating the World's Videos

Jasper Snoek; Luciano Sbaiz; Hrishikesh Aradhye

This paper explores the problem of large-scale automatic video geolocation. A methodology is developed to infer the location at which videos from Anonymized.com were recorded using video content and various additional signals. Specifically, multiple binary Adaboost classifiers are trained to identify particular places based on learning decision stumps on sets of hundreds of thousands of sparse features. A one-vs-all classification strategy is then used to classify the location at which videos were recorded. Empirical validation is performed on an immense data set of 20 million labeled videos. Results demonstrate that high accuracy video geolocation is indeed possible for many videos and locations and interesting relationships exist between between videos and the places where they are recorded.

acm multimedia | 2009

Adaptive, selective, automatic tonal enhancement of faces

Hrishikesh Aradhye; George Toderici; Jay Yagnik

This paper presents an efficient, personalizable and yet completely automatic algorithm for enhancing the brightness, tonal balance, and contrast of faces in thumbnails of online videos where multiple colored illumination sources are the norm and artifacts such as poor illumination and backlight are common. These artifacts significantly lower the perceptual quality of faces and skin, and cannot be easily corrected by common global image transforms. The same identifiable user, however, often uploads or participates in multiple photos, videos, or video chat sessions with varying illumination conditions. The proposed algorithm adaptively transforms the skin pixels in a poor illumination environment to match the skin color model of a prototypical face of the same user in a better illumination environment. It leaves the remaining non-skin portions of the image virtually unchanged while ascertaining a smooth, natural appearance. A component of our system automatically selects such a prototypical face for each user given a collection of uploaded videos/photo albums or prior video chat sessions by that user. We present several human rating studies on YouTube data that quantitatively demonstrate significant improvement in facial quality using the proposed algorithm.

Archive | 2013