Claudia Hauff
Delft University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Claudia Hauff.
international world wide web conferences | 2012
Fabian Abel; Claudia Hauff; Geert-Jan Houben; Richard Stronkman; Ke Tao
In this paper, we present Twitcident, a framework and Web-based system for filtering, searching and analyzing information about real-world incidents or crises. Twitcident connects to emergency broadcasting services and automatically starts tracking and filtering information from Social Web streams (Twitter) when a new incident occurs. It enriches the semantics of streamed Twitter messages to profile incidents and to continuously improve and adapt the information filtering to the current temporal context. Faceted search and analytical tools allow users to retrieve particular information fragments and overview and analyze the current situation as reported on the Social Web. Demo: http://wis.ewi.tudelft.nl/twitcident/
acm conference on hypertext | 2012
Fabian Abel; Claudia Hauff; Geert-Jan Houben; Richard Stronkman; Ke Tao
Automatically filtering relevant information about a real-world incident from Social Web streams and making the information accessible and findable in the given context of the incident are non-trivial scientific challenges. In this paper, we engineer and evaluate solutions that analyze the semantics of Social Web data streams to solve these challenges. We introduce Twitcident, a framework and Web-based system for filtering, searching and analyzing information about real-world incidents or crises. Given an incident, our framework automatically starts tracking and filtering information that is relevant for the incident from Social Web streams and Twitter particularly. It enriches the semantics of streamed messages to profile incidents and to continuously improve and adapt the information filtering to the current temporal context. Faceted search and analytical tools allow people and emergency services to retrieve particular information fragments and overview and analyze the current situation as reported on the Social Web. We put our Twitcident system into practice by connecting it to emergency broadcasting services in the Netherlands to allow for the retrieval of relevant information from Twitter streams for any incident that is reported by those services. We conduct large-scale experiments in which we evaluate (i) strategies for filtering relevant information for a given incident and (ii) search strategies for finding particular information pieces. Our results prove that the semantic enrichment offered by our framework leads to major and significant improvements of both the filtering and the search performance. A demonstration is available via: http://wis.ewi.tudelft.nl/twitcident/
cross language evaluation forum | 2008
Dong Nguyen; Arnold Overwijk; Claudia Hauff; Dolf Trieschnigg; Djoerd Hiemstra; Franciska de Jong
This paper presents WikiTranslate, a system which performs query translation for cross-lingual information retrieval (CLIR) using only Wikipedia to obtain translations. Queries are mapped to Wikipedia concepts and the corresponding translations of these concepts in the target language are used to create the final query. WikiTranslate is evaluated by searching with topics formulated in Dutch, French and Spanish in an English data collection. The system achieved a performance of 67% compared to the monolingual baseline.
international world wide web conferences | 2013
Ke Tao; Fabian Abel; Claudia Hauff; Geert-Jan Houben; Ujwal Gadiraju
With more than 340~million messages that are posted on Twitter every day, the amount of duplicate content as well as the demand for appropriate duplicate detection mechanisms is increasing tremendously. Yet there exists little research that aims at detecting near-duplicate content on microblogging platforms. We investigate the problem of near-duplicate detection on Twitter and introduce a framework that analyzes the tweets by comparing (i) syntactical characteristics, (ii) semantic similarity, and (iii) contextual information. Our framework provides different duplicate detection strategies that, among others, make use of external Web resources which are referenced from microposts. Machine learning is exploited in order to learn patterns that help identifying duplicate content. We put our duplicate detection framework into practice by integrating it into Twinder, a search engine for Twitter streams. An in-depth analysis shows that it allows Twinder to diversify search results and improve the quality of Twitter search. We conduct extensive experiments in which we (1) evaluate the quality of different strategies for detecting duplicates, (2) analyze the impact of various features on duplicate detection, (3) investigate the quality of strategies that classify to what exact level two microposts can be considered as duplicates and (4) optimize the process of identifying duplicate content on Twitter. Our results prove that semantic features which are extracted by our framework can boost the performance of detecting duplicates.
international acm sigir conference on research and development in information retrieval | 2012
Claudia Hauff; Geert-Jan Houben
Estimating the geographic location of images is a task which has received increasing attention recently. Large numbers of images uploaded to platforms such as Flickr do not contain GPS-based latitude/longitude coordinates. Obtaining such geographic information is beneficial for a variety of applications including travelogues, visual place descriptions and personalized travel recommendations. While most works in this area only exploit an images textual meta-data (tags, title, etc.) to estimate at what geographic location the image was taken, we consider an additional textual dimension: the image owners traces on the social Web. Specifically, we hypothesize that information extracted from a persons microblog stream(s) can be utilized to improve the accuracy with which the geographic location of the images is estimated. In this paper, we investigate this hypothesis on the example of Twitter streams and find it to be confirmed. The median error distance in kilometres decreases by up to 67% in comparison to existing state-of-the-art. The best results are achieved when tweets that were posted up to two days before and after an image was taken are considered. Moreover, we also find another type of additional information useful: population density data.
european conference on information retrieval | 2012
Claudia Hauff; Geert-Jan Houben
Estimating the geographic location of images is a task which has received a lot of attention in recent years. Large numbers of items uploaded to Flickr do not contain GPS-based latitude/longitude coordinates, although it would be beneficial to obtain such geographic information for a wide variety of potential applications such as travelogues and visual place descriptions. While most works in this area consider an images textual meta-data to estimate its geo-location, we consider an additional textual dimension: the image owners traces on the social Web, in particular on the micro-blogging platform Twitter. We investigate the following question: does enriching an images available textual meta-data with a users tweets improve the accuracy of the geographic location estimation process? The results show that this is indeed the case; in an oracle setting, the median error in kilometres decreases by 87%, in the best automatic approach the median error decreases by 56%.
acm conference on hypertext | 2014
Jie Yang; Claudia Hauff; Alessandro Bozzon; Geert-Jan Houben
Collaborative Question Answering (cQA) platforms are a very popular repository of crowd-generated knowledge. By formulating questions, users express needs that other members of the cQA community try to collaboratively satisfy. Poorly formulated questions are less likely to receive useful responses, thus hindering the overall knowledge generation process. Users are often asked to reformulate their needs, adding specific details, providing examples, or simply clarifying the context of their requests. Formulating a good question is a task that might require several interactions between the asker and other community members, thus delaying the actual answering and, possibly, decreasing the interest of the community in the issue. This paper contributes new insights to the study of cQA platforms by investigating the editing behaviour of users. We identify a number of editing actions, and provide a two-step approach for the automatic suggestion of the most likely editing actions to be performed for a newly created question. We evaluated our approach in the context of the Stack Overflow cQA , demonstrating how, for given types of editing actions, it is possible to provide accurate reformulation suggestions.
international acm sigir conference on research and development in information retrieval | 2013
Claudia Hauff
Obtaining geographically tagged multimedia items from social Web platforms such as Flickr is beneficial for a variety of applications including the automatic creation of travelogues and personalized travel recommendations. In order to take advantage of the large number of photos and videos that do not contain (GPS-based) latitude/longitude coordinates, a number of approaches have been proposed to estimate the geographic location where they were taken. Such location estimation methods rely on existing geotagged multimedia items as training data. Across application and usage scenarios, it is commonly assumed that the available geotagged items contain (reasonably) accurate latitude/longitude coordinates. Here, we consider this assumption and investigate how accurate the provided location data is. We conduct a study of Flickr images and videos and find that the accuracy of the geotag information is highly dependent on the popularity of the location: images/videos taken at popular (unpopular) locations, are likely to be geotagged with a high (low) degree of accuracy with respect to the ground truth.
international conference on web engineering | 2012
Ke Tao; Fabian Abel; Claudia Hauff; Geert-Jan Houben
How can one effectively identify relevant messages in the hundreds of millions of Twitter messages that are posted every day? In this paper, we aim to answer this fundamental research question and introduce Twinder, a scalable search engine for Twitter streams. The Twinder search engine exploits various features to estimate the relevance of Twitter messages (tweets) for a given topic. Among these features are both topic-sensitive features such as measures that compute the semantic relatedness between a tweet and a topic as well as topic-insensitive features which characterize a tweet with respect to its syntactical, semantic, sentiment and contextual properties. In our evaluations, we investigate the impact of the different features on retrieval performance. Our results prove the effectiveness of the Twinder search engine - we show that in particular semantic features yield high precision and recall values of more than 35% and 45% respectively.
european conference on information retrieval | 2005
Claudia Hauff; Leif Azzopardi
Much research has been performed investigating how links between web pages can be exploited in an Information Retrieval setting [1,4]. In this poster, we investigate the application of the Barabasi-Albert model to link structure analysis on a collection of web documents within the language modeling framework. Our model utilizes the web structure as described by a Scale Free Network and derives a document prior based on a web documents age and linkage. Preliminary experiments indicate the utility of our approach over other current link structure algorithms and warrants further research.