Judith Gelernter
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Judith Gelernter.
Transactions in Gis | 2011
Judith Gelernter; Nikolai Mushegian
Widespread use of social media during crises has become commonplace, as shown by the volume of messages during the Haiti earthquake of 2010 and Japan tsunami of 2011. Location mentions are particularly important in disaster messages as they can show emergency responders where problems have occurred. This article explores the sorts of locations that occur in disaster-related social messages, how well off-the-shelf software identifies those locations, and what is needed to improve automated location identification, called geo-parsing. To do this, we have sampled Twitter messages from the February 2011 earthquake in Christchurch, Canterbury, New Zealand. We annotated locations in messages manually to make a gold standard by which to measure locations identified by a Named Entity Recognition software. The Stanford NER software found some locations that were proper nouns, but did not identify locations that were not capitalized, local streets and buildings, or non-standard place abbreviations and mis-spellings that are plentiful in microtext. We review how these problems might be solved in software research, and model a readable crisis map that shows crisis location clusters via enlarged place labels.
Geoinformatica | 2013
Judith Gelernter; Shilpa Balaji
The location of the author of a social media message is not invariably the same as the location that the author writes about in the message. In applications that mine these messages for information such as tracking news, political events or responding to disasters, it is the geographic content of the message rather than the location of the author that is important. To this end, we present a method to geo-parse the short, informal messages known as microtext. Our preliminary investigation has shown that many microtext messages contain place references that are abbreviated, misspelled, or highly localized. These references are missed by standard geo-parsers. Our geo-parser is built to find such references. It uses Natural Language Processing methods to identify references to streets and addresses, buildings and urban spaces, and toponyms, and place acronyms and abbreviations. It combines heuristics, open-source Named Entity Recognition software, and machine learning techniques. Our primary data consisted of Twitter messages sent immediately following the February 2011 earthquake in Christchurch, New Zealand. The algorithm identified location in the data sample, Twitter messages, giving an F statistic of 0.85 for streets, 0.86 for buildings, 0.96 for toponyms, and 0.88 for place abbreviations, with a combined average F of 0.90 for identifying places. The same data run through a geo-parsing standard, Yahoo! Placemaker, yielded an F statistic of zero for streets and buildings (because Placemaker is designed to find neither streets nor buildings), and an F of 0.67 for toponyms.
geographic information retrieval | 2013
Judith Gelernter; Wei Zhang
A geo-parser automatically identifies location words in a text. We have generated a geo-parser specifically to find locations in unstructured Spanish text. Our novel geo-parser architecture combines the results of four parsers: a lexico-semantic Named Location Parser, a rules-based building parser, a rules-based street parser, and a trained Named Entity Parser. Each parser has different strengths: the Named Location Parser is strong in recall, and the Named Entity Parser is strong in precision, and building and street parser finds buildings and streets that the others are not designed to do. To test our Spanish geo-parser performance, we compared the output of Spanish text through our Spanish geo-parser, with that same Spanish text translated into English and run through our English geo-parser. The results were that the Spanish geo-parser identified toponyms with an F1 of .796, and the English geo-parser identified toponyms with an F1 of .861 (and this is despite errors introduced by translation from Spanish to English), compared to an F1 of .114 from a commercial off-the-shelf Spanish geo-parser. Results suggest (1) geo-parsers should be built specifically for unstructured text, as have our Spanish and English geo-parsers, and (2) location entities in Spanish that have been machine translated to English are robust to geo-parsing in English.
Proceedings of the Second ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information | 2013
Judith Gelernter; Gautam Ganesh; Hamsini Krishnakumar; Wei Zhang
Geographical knowledge resources or gazetteers that are enriched with local information have the potential to add geographic precision to information retrieval. We have identified sources of novel local gazetteer entries in crowd-sourced OpenStreetMap and Wikimapia geotags that include geo-coordinates. We created a fuzzy match algorithm using machine learning (SVM) that checks both for approximate spelling and approximate geocoding in order to find duplicates between the crowd-sourced tags and the gazetteer in effort to absorb those tags that are novel. For each crowd-sourced tag, our algorithm generates candidate matches from the gazetteer and then ranks those candidates based on word form or geographical relations between each tag and gazetteer candidate. We compared a baseline of edit distance for candidate ranking to an SVM-trained candidate ranking model on a city level location tag match task. Experiment results show that the SVM greatly outperforms the baseline.
acm/ieee joint conference on digital libraries | 2008
Judith Gelernter; Michael Lesk
Maps in journal articles are difficult to access since they are rarely indexed apart from the articles themselves. Our prototype of a searchable map library was built by extracting maps and harvesting metadata from scanned articles to classify each map.
collaborative computing | 2007
Judith Gelernter
Collaborative (or social tagging) options are being added to many database catalogs on the assumption that not only those who assign tags but also those who use the catalog find such tags beneficial. But no quantitative analyses of collaborative tags exist to support this assumption. Based on questionnaires mixing collaborative tag clouds from http://www.LibraryThing.com and controlled library of congress subject heading (LCSH) strings from the library of congress catalog http://catalog.loc.gov, it was found that controlled vocabulary terms are selected above collaborative terms; that the string format is preferred to the cloud; that strings appear to ldquoperformrdquo better in terms of reflecting book content; and that it is important to users that recall is high (where uncontrolled vocabulary retrieval is generally low). Results were found to be dependent upon particulars of tag cloud or string. The outcome indicates that catalog users would derive fewer information retrieval benefits from the current form of collaborative clouds than from the staid strings of the Library of Congress Subject Headings.
Archive | 2008
Michael Lesk; Judith Gelernter
Even geographers need ways to find what they need among the thousands of maps buried in map libraries and in journal articles. It is not enough to provide search by region and keyword. Studies of queries show that people often want to look for maps showing a certain location at a certain time period or with a subject theme. The difficulties in finding such maps are several. Maps in physical and digital collections often are organized by region. Multi-dimensional manual indexing is time-consuming and so many maps are not indexed. Further, maps in non-geographical publications are indexed rarely, making them essentially invisible. In an attempt to solve actual problems, this dissertation research automatically indexes maps in published documents so that they become visible to searchers. The MapSearch prototype aggregates journal components to allow finer-grained searching of article content. MapSearch allows search by region, time, or theme as well as by keyword (http://scilsresx.rutgers.edu/∼gelern/maps/). Automatic classification of maps is a multi-step process. A sample of 150 maps and the text (that becomes metadata) describing the maps have been copied from a random assortment of journal articles. Experience taking metadata manually enabled the writing of instructions to mine data automatically; experience with manual classification allowed for writing algorithms that classify maps by region, time and theme automatically. That classification is supported by ontologies for region, time and theme that have been generated or adapted for the purpose and that allow what has been called intelligent search, or smart search. The 150 map training set was loaded into the MapSearch engine repeatedly, each time comparing automatically-assigned classification to manually-assigned classification. Analysis of computer misclassifications suggested whether the ontology or classification algorithm should be modified in order to improve classification accuracy. After repeated trials and analyses to improve the algorithms and ontologies, MapSearch was evaluated with a set of 55 previously unseen maps in a test set. Automated classification of the test set of maps was compared to the manual classification, with the assumption that the manual process provides the most accurate classification obtainable. Results showed an accuracy, or a correspondence between manual and automated classification, of 75% for region, 69% for time, and 84% for theme. The dissertation contributes: (1) a protocol to harvest metadata from maps in published articles that could be adapted to aggregate other sorts of journal article components such as charts, diagrams, cartoons or photographs, (2) a method for ontology-supported metadata processing to allow for improved result relevance that could be applied to other sorts of data, (3) algorithms to classify maps into region, time and theme facets that could be adapted to classify other document types, and (4) a proof-of-concept MapSearch system that could be expanded with heterogeneous map types.
Social Network Analysis and Mining | 2013
Judith Gelernter; Dong Cao; Kathleen M. Carley
It is often possible to understand group change over time through examining social network data in a spatial and temporal context. Providing that context via text analysis requires identifying locations and associating them with people. Our GeoRef algorithm too automatically does this person-to-place mapping. It involves the identification of location, and uses syntactic proximity of words in the text to link location to person’s name. We describe an application using the algorithm based upon data from the Sudan Tribune divided into three periods in 2006 for the Darfur crisis. Contributions of this paper are (1) techniques to mine for location from text (2) techniques to mine for social network edges (associations between location and person), (3) spatio-temporal maps made from mined data, and (4) social network analysis based on mined data.
Sigspatial Special | 2009
Judith Gelernter
The goal of this research is to organize maps mined from journal articles into categories for hierarchical browsing within region, time and theme facets. A 150-map training set collected manually was used to develop classifiers. Metadata pertinent to the maps were harvested and then run separately though knowledge sources and our classifiers for region, time and theme. Evaluation of the system based on a 54-map test set of unseen maps showed 69%--93% classification accuracy when compared with two human classifications for the same maps. Data mining and semantic analysis methods used here could support systems that index other types of article components such as diagrams or charts by region, time and theme.
acm/ieee joint conference on digital libraries | 2009
Judith Gelernter; Michael Lesk
This paper describes techniques for automatically extracting and classifying maps found within articles. The process uses image analysis to find text in maps, document structure to find captions and titles, and then text mining to assign each map to a subject category, a geographical place, and a time period. The text analysis is based on authority lists taken from gazetteers and from library classifications.