Sergej Sizov
University of Düsseldorf
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sergej Sizov.
international semantic web conference | 2009
Thomas Franz; Antje Schultz; Sergej Sizov; Steffen Staab
The Semantic Web fosters novel applications targeting a more efficient and satisfying exploitation of the data available on the web, e.g. faceted browsing of linked open data. Large amounts and high diversity of knowledge in the Semantic Web pose the challenging question of appropriate relevance ranking for producing fine-grained and rich descriptions of the available data, e.g. to guide the user along most promising knowledge aspects. Existing methods for graph-based authority ranking lack support for fine-grained latent coherence between resources and predicates (i.e. support for link semantics in the linked data model). In this paper, we present TripleRank, a novel approach for faceted authority ranking in the context of RDF knowledge bases. TripleRank captures the additional latent semantics of Semantic Web data by means of statistical methods in order to produce richer descriptions of the available data. We model the Semantic Web by a 3-dimensional tensor that enables the seamless representation of arbitrary semantic links. For the analysis of that model, we apply the PARAFAC decomposition, which can be seen as a multi-modal counterpart to Web authority ranking with HITS. The result are groupings of resources and predicates that characterize their authority and navigational (hub) properties with respect to identified topics. We have applied TripleRank to multiple data sets from the linked open data community and gathered encouraging feedback in a user evaluation where TripleRank results have been exploited in a faceted browsing scenario.
web search and data mining | 2010
Sergej Sizov
We describe an approach for multi-modal characterization of social media by combining text features (e.g. tags as a prominent example of short, unstructured text labels) with spatial knowledge (e.g. geotags and coordinates of images and videos). Our model-based framework GeoFolk combines these two aspects in order to construct better algorithms for content management, retrieval, and sharing. The approach is based on multi-modal Bayesian models which allow us to integrate spatial semantics of social media in a well-formed, probabilistic manner. We systematically evaluate the solution on a subset of Flickr data, in characteristic scenarios of tag recommendation, content classification, and clustering. Experimental results show that our method outperforms baseline techniques that are based on one of the aspects alone. The approach described in this contribution can also be used in other domains such as Geoweb retrieval.
acm conference on hypertext | 2009
Stefan Siersdorfer; Sergej Sizov
The rapidly increasing popularity of Web 2.0 knowledge and content sharing systems and growing amount of shared data make discovering relevant content and finding contacts a difficult enterprize. Typically, folksonomies provide a rich set of structures and social relationships that can be mined for a variety of recommendation purposes. In this paper we propose a formal model to characterize users, items, and annotations in Web 2.0 environments. Our objective is to construct social recommender systems that predict the utility of items, users, or groups based on the multi-dimensional social environment of a given user. Based on this model we introduce recommendation mechanisms for content sharing frameworks. Our comprehensive evaluation shows the viability of our approach and emphasizes the key role of social meta knowledge for constructing effective recommendations in Web 2.0 applications.
Journal of Web Semantics | 2009
Renata Queiroz Dividino; Sergej Sizov; Steffen Staab; Bernhard Schueler
The Semantic Web is based on accessing and reusing RDF data from many different-sources, which one may assign different levels of authority and credibility. Existing Semantic Web query languages, like SPARQL, have targeted the retrieval, combination and re-use of facts, but have so far ignored all aspects of meta knowledge, such as origins, authorship, recency or certainty of data. In this paper, we present an original, generic, formalized and implemented approach for managing many dimensions of meta knowledge, like source, authorship, certainty and others. The approach re-uses existing RDF modeling possibilities in order to represent meta knowledge. Then, it extends SPARQL query processing in such a way that given a SPARQL query for data, one may request meta knowledge without modifying the query proper. Thus, our approach achieves highly flexible and automatically coordinated querying for data and meta knowledge, while completely separating the two areas of concern.
international acm sigir conference on research and development in information retrieval | 2004
Stefan Siersdorfer; Sergej Sizov
This paper addresses the problem of automatically structuring heterogenous document collections by using clustering methods. In contrast to traditional clustering, we study restrictive methods and ensemble-based meta methods that may decide to leave out some documents rather than assigning them to inappropriate clusters with low confidence. These techniques result in higher cluster purity, better overall accuracy, and make unsupervised self-organization more robust. Our comprehensive experimental studies on three different real-world data collections demonstrate these benefits. The proposed methods seem particularly suitable for automatically substructuring personal email folders or personal Web directories that are populated by focused crawlers, and they can be combined with supervised classification techniques.
web search and data mining | 2014
Christoph Carl Kling; Jérôme Kunegis; Sergej Sizov; Steffen Staab
Nowadays, large collections of photos are tagged with GPS coordinates. The modelling of such large geo-tagged corpora is an important problem in data mining and information retrieval, and involves the use of geographical information to detect topics with a spatial component. In this paper, we propose a novel geographical topic model which captures dependencies between geographical regions to support the detection of topics with complex, non-Gaussian distributed spatial structures. The model is based on a multi-Dirichlet process (MDP), a novel generalisation of the hierarchical Dirichlet process extended to support multiple base distributions. Our method thus is called the MDP-based geographical topic model (MGTM). We show how to use a MDP to dynamically smooth topic distributions between groups of spatially adjacent documents. In systematic quantitative and qualitative evaluations using independent datasets from prior related work, we show that such a model can exploit the adjacency of regions and leads to a significant improvement in the quality of topics compared to the state of the art in geographical topic modelling.
international world wide web conferences | 2008
Bernhard Schueler; Sergej Sizov; Steffen Staab; Duc Thanh Tran
The Semantic Web is based on accessing and reusing RDF data from many different sources, which one may assign different levels of authority and credibility. Existing Semantic Web query languages, like SPARQL, have targeted the retrieval, combination and reuse of facts, but have so far ignored all aspects of meta knowledge, such as origins, authorship, recency or certainty of data, to name but a few. In this paper, we present an original, generic, formalized and implemented approach for managing many dimensions of meta knowledge, like source, authorship, certainty and others. The approach re-uses existing RDF modeling possibilities in order to represent meta knowledge. Then, it extends SPARQL query processing in such a way that given a SPARQL query for data, one may request meta knowledge without modifying the original query. Thus, our approach achieves highly flexible and automatically coordinated querying for data and meta knowledge, while completely separating the two areas of concern.
european semantic web conference | 2008
Olaf Görlitz; Sergej Sizov; Steffen Staab
Collaborative tagging systems like Flickr and del.icio.us provide centralized content annotation and sharing which is simple to use and attracts many people. A combination of tagging with peer-to-peer systems overcomes typical limitations of centralized systems, however, decentralization also hampers the efficient computation of global statistics facilitating user navigation. We present Tagster, a peer-to-peer based tagging system that provides a solution to this challenge. We describe a typical scenario that demonstrates the advantages of distributed content sharing with Tagster.
international conference on semantic computing | 2007
Bernhard Schueler; Sergej Sizov; Steffen Staab
The integration of knowledge from heterogeneous information sources and applications does not only require the conceptual mapping of information structures, but it also requires the treatment of semantic meta knowledge (i.e. knowledge about knowledge) in a generic manner. We describe a generic mechanism for (i) modeling and (ii) querying semantic meta knowledge in the context of RDF repositories. We substantiate our approach by use cases from a project about multi-modal information integration from different information sources.One of the ultimate goals of natural language processing (NLP) systems is understanding the meaning of what is being transmitted, irrespective of the medium (e.g., written versus spoken) or the form (e.g., static documents versus dynamic dialogues). Although much work has been done in traditional language domains such as speech and static written text, little has yet been done in the newer communication domains enabled by the Internet, e.g., online chat and instant messaging. This is in part due to the fact that there are no annotated chat corpora available to the broader research community. The purpose of this research is to build a chat corpus, tagged with lexical (token part-of-speech labels), syntactic (post parse tree), and discourse (post classification) information. Such a corpus can then be used to develop more complex, statistical-based NLP applications that perform tasks such as author profiling, entity identification, and social network analysis.
very large data bases | 2003
Sergej Sizov; Jens Graupmann; Martin Theobald
Publisher Summary Focused crawling is a relatively new, promising approach to improving the recall of expert search on the Web. It typically starts from a user- or community specific tree of topics along with a few training documents for each tree node, and then crawls the Web with focus on these topics of interest. This process can efficiently build a theme-specific, hierarchical directory whose nodes are populated with relevant high-quality documents for expert Web search. The BINGO! focused crawler implements an approach that aims to overcome the limitations of the initial training data. BINGO! identifies, among the crawled and positively classified documents of a topic, characteristic archetypes and uses them for periodically retraining the classifier. This way the crawler is dynamically adapted based on the most significant documents seen so far.