Hrishikesh Aradhye
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hrishikesh Aradhye.
computer vision and pattern recognition | 2010
George Toderici; Hrishikesh Aradhye; Marius Pasca; Luciano Sbaiz; Jay Yagnik
We present a system that automatically recommends tags for YouTube videos solely based on their audiovisual content. We also propose a novel framework for unsupervised discovery of video categories that exploits knowledge mined from the World-Wide Web text documents/searches. First, video content to tag association is learned by training classifiers that map audiovisual content-based features from millions of videos on YouTube.com to existing uploader-supplied tags for these videos. When a new video is uploaded, the labels provided by these classifiers are used to automatically suggest tags deemed relevant to the video. Our system has learned a vocabulary of over 20,000 tags. Secondly, we mined large volumes of Web pages and search queries to discover a set of possible text entity categories and a set of associated is-A relationships that map individual text entities to categories. Finally, we apply these is-A relationships mined from web text on the tags learned from audiovisual content of videos to automatically synthesize a reliable set of categories most relevant to videos – along with a mechanism to predict these categories for new uploads. We then present rigorous rating studies that establish that: (a) the average relevance of tags automatically recommended by our system matches the average relevance of the uploader-supplied tags at the same or better coverage and (b) the average precision@K of video categories discovered by our system is 70% with K=5.
international conference on data mining | 2009
Hrishikesh Aradhye; George Toderici; Jay Yagnik
This paper discusses a new method for automatic discovery and organization of descriptive concepts (labels) within large real-world corpora of user-uploaded multimedia, such as YouTube. com. Conversely, it also provides validation of existing labels, if any. While training, our method does not assume any explicit manual annotation other than the weak labels already available in the form of video title, description, and tags. Prior work related to such auto-annotation assumed that a vocabulary of labels of interest (e. g., indoor, outdoor, city, landscape) is specified a priori. In contrast, the proposed method begins with an empty vocabulary. It analyzes audiovisual features of 25 million YouTube. com videos -- nearly 150 years of video data -- effectively searching for consistent correlation between these features and text metadata. It autonomously extends the label vocabulary as and when it discovers concepts it can reliably identify, eventually leading to a vocabulary with thousands of labels and growing. We believe that this work significantly extends the state of the art in multimedia data mining, discovery, and organization based on the technical merit of the proposed ideas as well as the enormous scale of the mining exercise in a very challenging, unconstrained, noisy domain.
international conference on acoustics, speech, and signal processing | 2009
Mehmet Emre Sargin; Hrishikesh Aradhye; Pedro J. Moreno; Ming Zhao
The number of video clips available online is growing at a tremendous pace. Conventionally, user-supplied metadata text, such as the title of the video and a set of keywords, has been the only source of indexing information for user-uploaded videos. Automated extraction of video content for unconstrained and large scale video databases is a challenging and yet unsolved problem. In this paper, we present an audiovisual celebrity recognition system towards automatic tagging of unconstrained web videos. Prior work on audiovisual person recognition relied on the fact that the person in the video is speaking and the features extracted from audio and visual domain are associated with each other throughout the video. However, this assumption is not valid on unconstrained web videos. Proposed method finds the audiovisual mapping and hence improve upon the association assumption. Considering the scale of the application, all pieces of the system are trained automatically without any human supervision. We present the results on 26,000 videos and show the effectiveness of the method per-celebrity basis.
international conference on acoustics, speech, and signal processing | 2011
Mehmet Emre Sargin; Hrishikesh Aradhye
We consider the problem of large-scale video classification. Our attention is focused on online video services since they can provide rich cross-video signals derived from user behavior. These signals help us to extract correlated information across videos which are co-browsed, co-uploaded, cocommented, co-queried, etc. Majority of the video classification methods omit this rich information and focus solely on a single test instance. In this paper, we propose a video classification system that exploits various cross-video signals offered by large-scale video databases. In our experiments, we show up to 4.5% absolute equal error rate (17% relative) improvement over the baseline on four video classification problems.
international conference on data mining | 2011
Jasper Snoek; Luciano Sbaiz; Hrishikesh Aradhye
This paper explores the problem of large-scale automatic video geolocation. A methodology is developed to infer the location at which videos from Anonymized.com were recorded using video content and various additional signals. Specifically, multiple binary Adaboost classifiers are trained to identify particular places based on learning decision stumps on sets of hundreds of thousands of sparse features. A one-vs-all classification strategy is then used to classify the location at which videos were recorded. Empirical validation is performed on an immense data set of 20 million labeled videos. Results demonstrate that high accuracy video geolocation is indeed possible for many videos and locations and interesting relationships exist between between videos and the places where they are recorded.
acm multimedia | 2009
Hrishikesh Aradhye; George Toderici; Jay Yagnik
This paper presents an efficient, personalizable and yet completely automatic algorithm for enhancing the brightness, tonal balance, and contrast of faces in thumbnails of online videos where multiple colored illumination sources are the norm and artifacts such as poor illumination and backlight are common. These artifacts significantly lower the perceptual quality of faces and skin, and cannot be easily corrected by common global image transforms. The same identifiable user, however, often uploads or participates in multiple photos, videos, or video chat sessions with varying illumination conditions. The proposed algorithm adaptively transforms the skin pixels in a poor illumination environment to match the skin color model of a prototypical face of the same user in a better illumination environment. It leaves the remaining non-skin portions of the image virtually unchanged while ascertaining a smooth, natural appearance. A component of our system automatically selects such a prototypical face for each user given a collection of uploaded videos/photo albums or prior video chat sessions by that user. We present several human rating studies on YouTube data that quantitatively demonstrate significant improvement in facial quality using the proposed algorithm.
Archive | 2013
Anna Lynn Patterson; Hrishikesh Aradhye; Wei Hua; Daniel Lehmann; Ruei-Sung Lin
Archive | 2012
Hrishikesh Aradhye; Wei Hua; Ruei-Sung Lin
Archive | 2010
Hrishikesh Aradhye; George Toderici; Jay Yagnik
Archive | 2012
Emre Sargin; Rodrigo Carceroni; Huazhong Ning; Wei Hua; Marius Renn; Hrishikesh Aradhye