Dhiraj Joshi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dhiraj Joshi is active.

Explore More

Publication

Featured researches published by Dhiraj Joshi.

ACM Computing Surveys | 2008

Image retrieval: Ideas, influences, and trends of the new age

Ritendra Datta; Dhiraj Joshi; Jia Li; James Ze Wang

We have witnessed great interest and a wealth of promise in content-based image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger association of weakly related fields. In this article, we survey almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation, and in the process discuss the spawning of related subfields. We also discuss significant challenges involved in the adaptation of existing image retrieval techniques to build systems that can be useful in the real world. In retrospect of what has been achieved so far, we also conjecture what the future may hold for image retrieval research.

european conference on computer vision | 2006

Studying aesthetics in photographic images using a computational approach

Ritendra Datta; Dhiraj Joshi; Jia Li; James Ze Wang

Aesthetics, in the world of art and photography, refers to the principles of the nature and appreciation of beauty. Judging beauty and other aesthetic qualities of photographs is a highly subjective task. Hence, there is no unanimously agreed standard for measuring aesthetic value. In spite of the lack of firm rules, certain features in photographic images are believed, by many, to please humans more than certain others. In this paper, we treat the challenge of automatically inferring aesthetic quality of pictures using their visual content as a machine learning problem, with a peer-rated online photo sharing Website as data source. We extract certain visual features based on the intuition that they can discriminate between aesthetically pleasing and displeasing images. Automated classifiers are built using support vector machines and classification trees. Linear regression on polynomial terms of the features is also applied to infer numerical aesthetics ratings. The work attempts to explore the relationship between emotions which pictures arouse in people, and their low-level content. Potential applications include content-based image retrieval and digital photography.

electronic commerce | 2002

A computationally efficient evolutionary algorithm for real-parameter optimization

Kalyanmoy Deb; Ashish Anand; Dhiraj Joshi

Due to increasing interest in solving real-world optimization problems using evolutionary algorithms (EAs), researchers have recently developed a number of real-parameter genetic algorithms (GAs). In these studies, the main research effort is spent on developing an efficient recombination operator. Such recombination operators use probability distributions around the parent solutions to create an offspring. Some operators emphasize solutions at the center of mass of parents and some around the parents. In this paper, we propose a generic parent-centric recombination operator (PCX) and a steady-state, elite-preserving, scalable, and computationally fast population-alteration model (we call the G3 model). The performance of the G3 model with the PCX operator is investigated on three commonly used test problems and is compared with a number of evolutionary and classical optimization algorithms including other real-parameter GAs with the unimodal normal distribution crossover (UNDX) and the simplex crossover (SPX) operators, the correlated self-adaptive evolution strategy, the covariance matrix adaptation evolution strategy (CMA-ES), the differential evolution technique, and the quasi-Newton method. The proposed approach is found to consistently and reliably perform better than all other methods used in the study. A scale-up study with problem sizes up to 500 variables shows a polynomial computational complexity of the proposed approach. This extensive study clearly demonstrates the power of the proposed technique in tackling real-parameter optimization problems.

IEEE Signal Processing Magazine | 2011

Aesthetics and Emotions in Images

Dhiraj Joshi; Ritendra Datta; Elena A. Fedorovskaya; Quang-Tuan Luong; James Ze Wang; Jia Li; Jiebo Luo

In this tutorial, we define and discuss key aspects of the problem of computational inference of aesthetics and emotion from images. We begin with a background discussion on philosophy, photography, paintings, visual arts, and psychology. This is followed by introduction of a set of key computational problems that the research community has been striving to solve and the computational framework required for solving them. We also describe data sets available for performing assessment and outline several real-world applications where research in this domain can be employed. A significant number of papers that have attempted to solve problems in aesthetics and emotion inference are surveyed in this tutorial. We also discuss future directions that researchers can pursue and make a strong case for seriously attempting to solve problems in this research domain.

Multimedia Tools and Applications | 2011

Geotagging in multimedia and computer vision--a survey

Jiebo Luo; Dhiraj Joshi; Jie Yu; Andrew C. Gallagher

Geo-tagging is a fast-emerging trend in digital photography and community photo sharing. The presence of geographically relevant metadata with images and videos has opened up interesting research avenues within the multimedia and computer vision domains. In this paper, we survey geo-tagging related research within the context of multimedia and along three dimensions: (1) Modalities in which geographical information can be extracted, (2) Applications that can benefit from the use of geographical information, and (3) The interplay between modalities and applications. Our survey will introduce research problems and discuss significant approaches. We will discuss the nature of different modalities and lay out factors that are expected to govern the choices with respect to multimedia and vision applications. Finally, we discuss future research directions in this field.

ACM Transactions on Multimedia Computing, Communications, and Applications | 2006

The Story Picturing Engine---a system for automatic text illustration

Dhiraj Joshi; James Ze Wang; Jia Li

We present an unsupervised approach to automated story picturing. Semantic keywords are extracted from the story, an annotated image database is searched. Thereafter, a novel image ranking scheme automatically determines the importance of each image. Both lexical annotations and visual content play a role in determining the ranks. Annotations are processed using the Wordnet. A mutual reinforcement-based rank is calculated for each image. We have implemented the methods in our Story Picturing Engine (SPE) system. Experiments on large-scale image databases are reported. A user study has been performed and statistical analysis of the results has been presented.

multimedia information retrieval | 2004

The story picturing engine: finding elite images to illustrate a story using mutual reinforcement

Dhiraj Joshi; James Ze Wang; Jia Li

In this paper, we present an approach towards automated story picturing based on mutual reinforcement principle. Story picturing refers to the process of illustrating a story with suitable pictures. In our approach, semantic keywords are extracted from the story text and an annotated image database is searched to form an initial picture pool. Thereafter, a novel image ranking scheme automatically determines the importance of each image. Both lexical annotations and visual content of an image play a role in determining its rank. Annotations are processed using the Wordnet to derive a lexical signature for each image. An integrated region based similarity is also calculated between each pair of images. An overall similarity measure is formed using lexical and visual features. In the end, a mutual reinforcement based rank is calculated for each image using the image similarity matrix. We also present a human behavior model based on a discrete state Markov process which captures the intuition for our technique. Experimental results have demonstrated the effectiveness of our scheme

conference on image and video retrieval | 2008

Inferring generic activities and events from image content and bags of geo-tags

Dhiraj Joshi; Jiebo Luo

The use of contextual information in building concept detectors for digital media has caught the attention of the multimedia community in the recent years. Generally speaking, any information extracted from image headers or tags, or from large collections of related images and used at classification time, can be considered as contextual. Such information, being discriminative in its own right, when combined with pure content-based detection systems using pixel information, can improve the overall recognition performance significantly. In this paper, we describe a framework for probabilistically modeling geographical information using a Geographical Information Systems (GIS) database for event and activity recognition in general-purpose consumer images, such as those obtained from Flickr. The proposed framework discriminatively models the statistical saliency of geo-tags in describing an activity or event. Our work leverages the inherent patterns of association between events and their geographical venues. We use descriptions of small local neighborhoods to form bags of geo tags as our representation. Statistical coherence is observed in such descriptions across a wide range of event classes and across many different users. In order to test our approach, we identify certain classes of activities and events wherein people commonly participate and take pictures. Images and corresponding metadata, for the identified events and activities, are obtained from Flickr. We employ visual detectors obtained from Columbia University (Columbia 374), which perform pure visual event and activity recognition. In our experiments, we present the performance advantage obtained by combining contextual GPS information with pixel-based detection systems.

congress on evolutionary computation | 2002

Real-coded evolutionary algorithms with parent-centric recombination

Kalyanrnoy Deb; Dhiraj Joshi; Ashish Anand

Due to an increasing interest in solving real-world optimization problems using evolutionary algorithms (EAs), researchers have developed a number of real-parameter genetic algorithms (GAs) in the recent past. In such studies, the main research effort is spent on developing an efficient recombination operator. Such recombination operators use probability distributions around the parent solutions to create offspring. Some operators emphasize solutions at the center of mass of parents and some around the parents. We propose a generic parent-centric recombination operator (PCX) and compare its performance with a couple of commonly-used mean-centric recombination operators (UNDX and SPX). With the help of a steady-state, elite-preserving, and computationally fast EA model, simulation results show the superiority of PCX on three test problems.

acm multimedia | 2008

Event recognition: viewing the world with a third eye

Jiebo Luo; Jie Yu; Dhiraj Joshi; Wei Hao

Semantic event recognition based only on vision cues is a challenging problem. This problem is particularly acute when the application domain is unconstrained still images available on the Internet or in personal repositories. In recent years, it has been shown that metadata captured with pictures can provide valuable contextual cues complementary to the image content and can be used to improve classification performance. With the recent geotagging phenomenon, an important piece of metadata available with many geotagged pictures now on the World Wide Web is GPS information. In this study, we obtain satellite images corresponding to picture location data and investigate their novel use to recognize the picture-taking environment, as if through a third eye above the object. Additionally, we combine this inference with classical vision-based event detection methods and study the synergistic fusion of the two approaches. We employ both color- and structure-based visual vocabularies for characterizing ground and satellite images, respectively. Training of satellite image classifiers is done using a multiclass AdaBoost engine while the ground image classifiers are trained using SVMs. Modeling and prediction involve some of the most interesting semantic event-activity classes encountered in consumer pictures, including those that occur in residential areas, commercial areas, beaches, sports venues, and parks. The powerful fusion of the complementary views achieves significant performance improvement over the ground view baseline. With integrated GPS-capable cameras on the horizon, we believe that our line of research can revolutionize event recognition and media annotation in years to come.

Explore More