Dawid Weiss
Poznań University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dawid Weiss.
ACM Computing Surveys | 2009
Claudio Carpineto; Stanislaw Osinski; Giovanni Romano; Dawid Weiss
Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preprocessing of search results, their clustering and visualization. Search results clustering, the core of the system, has specific requirements that cannot be addressed by classical clustering algorithms. We emphasize the role played by the quality of the cluster labels as opposed to optimizing only the clustering structure. We highlight the main characteristics of a number of existing Web clustering engines and also discuss how to evaluate their retrieval performance. Some directions for future research are finally presented.
IEEE Intelligent Systems | 2005
Stanislaw Osinski; Dawid Weiss
Without search engines, the Internet would be an enormous amount of disorganized information that would certainly be interesting but perhaps not very useful. Search engines help us in all kinds of tasks and are constantly improving result relevance. The Lingo algorithm combines common phrase discovery and latent semantic indexing techniques to separate search results into meaningful groups. It looks for meaningful phrases to use as cluster labels and then assigns documents to the labels to form groups.
intelligent information systems | 2004
Stanis law Osiński; Jerzy Stefanowski; Dawid Weiss
Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search results list returned from a search engine. In this paper we present Lingo—a novel algorithm for clustering search results, which emphasizes cluster description quality. We describe methods used in the algorithm: algebraic transformations of the term-document matrix and frequent phrase extraction using suffix arrays. Finally, we discuss results acquired from an empirical evaluation of the algorithm.
atlantic web intelligence conference | 2003
Jerzy Stefanowski; Dawid Weiss
This paper relates to a technique of improving results visualization in Web search engines known as search results clustering. We introduce an open extensible research system for examination and development of search results clustering algorithms - Carrot2. We also discuss attempts to measuring quality of discovered clusters and demonstrate results of our experiments with quality assessment when inflectionally rich language (Polish) is clustered using a representative algorithm - Suffix Tree Clustering.
intelligent information systems | 2004
Stanislaw Osinski; Dawid Weiss
Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search hits list, returned from a search engine. In this paper we present the results of an experimental evaluation of a new algorithm named Lingo. We use Open Directory Project as a source of high-quality narrow-topic document references and mix them into several multi-topic test sets for the algorithm We then compare the clusters acquired from Lingo to the expected set of ODP categories mixed in the input. Finally we discuss observations from the experiment, highlighting the algorithm’s strengths and weaknesses and conclude with research directions for the future.
atlantic web intelligence conference | 2005
Stanislaw Osinski; Dawid Weiss
In this paper we present the design goals and implementation outline of Carrot2, an open source framework for rapid development of applications dealing with Web Information Retrieval and Web Mining. The framework has been written from scratch keeping in mind flexibility and efficiency of processing. We show two software architectures that meet the requirements of these two aspects and provide evidence of their use in clustering of search results. We also discuss the importance and advantages of contributing and integrating the results of scientific projects with the open source community.
adversarial information retrieval on the web | 2008
Jakub Piskorski; Marcin Sydow; Dawid Weiss
We study the usability of linguistic features in the Web spam classification task. The features were computed on two Web spam corpora: Webspam-Uk2006 and Webspam-Uk2007, we make them publicly available for other researchers. Preliminary analysis seems to indicate that certain linguistic features may be useful for the spam-detection task when combined with features studied elsewhere.
intelligent information systems | 2003
Dawid Weiss; Jerzy Stefanowski
In this paper we consider the problem of web search results clustering in the Polish language, supporting our analysis with results acquired from an experimental system named Carrot. The algorithm we put into consideration — Suffix Tree Clustering has been acknowledged as being very efficient when applied to English. We present conclusions from its experimental application to Polish, demonstrating fragile areas of the algorithm related to rich inflection and certain properties of the input language. Our results indicate that the characteristics of produced clusters (number, distinctiveness), strongly depend on pre-processing phase. We also attempt to investigate the influence of two primary STC parameters: merge threshold and minimum base cluster score on the number and quality of results. Finally, we introduce two approaches to efficient, approximate conflation of Polish words: quasi-stemmer and an automaton-based lemmatization method.
business information systems | 2007
Dawid Weiss; Marcin Zduniak
Applications written for mobile devices have become more and more complex, adjusting to the constantly improving computational power of hardware. With the growing application size comes the need for automated testing frameworks, particularly frameworks for automated testing of user interaction and graphical user interface. While such testing (also called capture-replay) has been thoroughly discussed in literature with respect to desktop applications, mobile development limits the possibilities significantly. To our best knowledge only a few solutions for creating automated tests of mobile applications exist and their functionality is very limited in general or constrained to only proprietary devices. In this paper we demonstrate preliminary results of our attempt to design and implement a framework for capturing and replaying user interaction in applications written for the Java 2 Micro Edition environment. Our evaluation test bed is a complex commercial mobile navigation system and the outcomes so far are very promising.
artificial intelligence in medicine in europe | 2005
Jerzy Błaszczyński; Ken Farion; Wojtek Michalowski; Szymon Wilk; Steven Rubin; Dawid Weiss
We have developed an algorithm for triaging acute pediatric abdominal pain in the Emergency Department using the discovery-driven approach. This algorithm is embedded into the MET-AP (Mobile Emergency Triage – Abdominal Pain) system – a clinical decision support system that assists physicians in making emergency triage decisions. In this paper we describe experimental evaluation of several data mining methods (inductive learning, case-based reasoning and Bayesian reasoning) and results leading to the selection of the rule-based algorithm.