Gregory Grefenstette

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gregory Grefenstette is active.

Explore More

Publication

Featured researches published by Gregory Grefenstette.

acm/ieee joint conference on digital libraries | 2008

Gazetiki: automatic creation of a geographical gazetteer

Adrian Popescu; Gregory Grefenstette; Pierre Alain Moëllic

Geolocalized databases are becoming necessary in a wide variety of application domains. Thus far, the creation of such databases has been a costly, manual process. This drawback has stimulated interest in automating their construction, for example, by mining geographical information from the Web. Here we present and evaluate a new automated technique for creating and enriching a geographical gazetteer, called Gazetiki. Our technique merges disparate information from Wikipedia, Panoramio, and web search engines in order to identify geographical names, categorize these names, find their geographical coordinates and rank them. The information produced in Gazetiki enhances and complements the Geonames database, using a similar domain model. We show that our method provides a richer structure and an improved coverage compared to another known attempt at automatically building a geographic database and, where possible, we compare our Gazetiki to Geonames.

international conference on computational linguistics | 2009

Conquering Language: Using NLP on a Massive Scale to Build High Dimensional Language Models from the Web

Gregory Grefenstette

Dictionaries only contain some of the information we need to know about a language. The growth of the Web, the maturation of linguistic processing tools, and the decline in price of memory storage allow us to envision descriptions of languages that are much larger than before. We can conceive of building a complete language model for a language using all the text that is found on the Web for this language. This article describes our current project to do just that.

international workshop on semantic media adaptation and personalization | 2006

Using Semantic Commonsense Resources in Image Retrieval

Adrian Popescu; Gregory Grefenstette; Pierre-Alain Moëllic

Many people use the Internet is to find pictures of things. When extraneous images appear in response to simple queries on a search engine, the user has a hard time understanding why his seemingly clear request was not properly satisfied. If the computer could only understand what he wanted better, then maybe the results would be more precise. We believe that the introduction of an ontology, though hidden from the user, into current image retrieval engines would provide more accurate image responses to his query. Coordinating the use of an ontology (OWL representation of WordNet) with image processing techniques, we have developed a system that, given an initial query, automatically returns images associated with the query by specializing the query concept using only its deepest hyponyms from the ontology. We show that picking randomly from this new set of images provides a better representation for the initial, more general query. In addition, we exploit the visual aspects of the images for these deepest hyponyms (the leaves of WordNet) to cluster the images into coherent sets. In this way we can present the results in a structured, and even ontologically labeled, manner to the user

Advances in Semantic Media Adaptation and Personalization | 2008

Improving Image Retrieval Using Semantic Resources

Adrian Popescu; Gregory Grefenstette; Pierre-Alain Moëllic

Many people use the Internet to find pictures of things. When extraneous images appear in response to simple queries on a search engine, the user has a hard time understanding why his seemingly clear request was not properly satisfied. If the computer could only understand what he wanted better, then maybe the results would be more precise. The introduction of an ontology, though hidden from the user, into current image retrieval engines may provide more accurate image responses to his query. The improvement of the results translates into the possibility of offering structured results, to disambiguate queries and to provide more interactivity options to the user, transforming the current string of character based retrieval into a concept based process. Each one of these aspects is presented and examples are used to support our proposals. We equally discuss the notion of picturability and justify our choice to work exclusively with entities that can be directly represented in a picture. Coordinating the use of a lexical ontology (an OWL representation of WordNet) with image processing techniques, we have developed a system that, given an initial query, automatically returns images associated with the query using automatic reformulation (each concepts is represented by its deepest hyponyms from the ontology). We show that picking randomly from this new set of pictures provides an improved representation for the initial, more general query. We also treat the visual aspects of the images for these deepest hyponyms (the leaves of WordNet). The depictions associated to leaf categories are clustered into coherent sets using low-level image features like color and texture. Some limitations (e.g. the quality and coverage of the semantic structure, the impossibility to answer complex queries) of the ontology based retrieval are equally discussed.

international symposium on visual computing | 2007

Deriving a priori co-occurrence probability estimates for object recognition from social networks and text processing

Guillaume Pitel; Christophe Millet; Gregory Grefenstette

Certain components in images can be recognized with high accuracy, for example, backgrounds such as leaves, grass, snow, sky, water. These components provide the human eye with context for identifying items in the foreground. Likewise for the machine, the identification of background should help in the recognition of foreground objects. But, in this case, the computer needs explicit lists of object and background co-occurrence probabilities. We examine two ways of deriving estimates of these a priori object co-occurrence probabilities: using an online social network of people storing annotated images, FlickR; and using variations on co-occurrence frequencies in natural language text. We show that the object co-occurrence probabilities derived from both sources are very similar. The possibility of using non-image derived semantic knowledge drawn from text processing for object recognition opens up possibilities of mining a priori probabilities for a much wider class of objects than those found in manually annotated collections.

Proceedings of the 2006 international workshop on Research issues in digital libraries | 2006

Toward a common semantics between media and languages

Christian Fluhr; Gregory Grefenstette; Adrian Popescu

For a computer to recognize objects, persons, situations or actions in multimedia, it needs to have learned models of each thing beforehand. For the moment, no large, general collection of training examples exists for the wide variety of things that we would want to automatically recognize in multimedia, video and still images. We believe that the WWW and current technology can allow us to automatically build such a resource. This paper describes a methodology for the construction of a grounded, general purpose, multimedia ontology that is instantiated through web processing. In this hierarchically organized ontology, concepts corresponding to concrete objects, persons, situations and actions are linked with still images, videos and sounds that represent exemplars of each concept. These examples are necessary resources for computing discriminating signatures for the recognition of the concepts in still images or videos. Since images retrieved using existing image search engines contain much noise hand are not always representative, we also present here our methodology for finding good representative for each concept.

WEC (2) | 2005

Evaluating Content Based Image Retrieval Techniques with the One Million Images CLIC TestBed.

Pierre-Alain Moëllic; Patrick Hède; Gregory Grefenstette; Christophe Millet

Archive | 2008

Automatic translation method

Christian Fluhr; Gregory Grefenstette; Nasredine Semmar

semantics and digital media technologies | 2006

Imaging Words - Wording Image.

Adrian Popescu; Gregory Grefenstette; Christophe Millet; Pierre-Alain Moëllic; Patrick Hède

Archive | 2007

Method for automatic translation

Christian Fluhr; Gregory Grefenstette; Nasredine Semmar

Explore More

Collaboration

Dive into the Gregory Grefenstette's collaboration.

Top Co-Authors

Christian Fluhr

University of Paris

View shared research outputs

Top Co-Authors

Christophe Millet

Télécom ParisTech

View shared research outputs

Explore More

French Alternative Energies and Atomic Energy Commission

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot

Dive into the research topics where Gregory Grefenstette is active.

Publication

Featured researches published by Gregory Grefenstette.

Gazetiki: automatic creation of a geographical gazetteer

Conquering Language: Using NLP on a Massive Scale to Build High Dimensional Language Models from the Web

Using Semantic Commonsense Resources in Image Retrieval

Improving Image Retrieval Using Semantic Resources

Deriving a priori co-occurrence probability estimates for object recognition from social networks and text processing

Toward a common semantics between media and languages

Evaluating Content Based Image Retrieval Techniques with the One Million Images CLIC TestBed.

Automatic translation method

Imaging Words - Wording Image.

Method for automatic translation

Collaboration

Dive into the Gregory Grefenstette's collaboration.