Peter Fankhauser | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Peter Fankhauser is active.

Explore More

Publication

Featured researches published by Peter Fankhauser.

intelligent user interfaces | 2007

Lies and propaganda: detecting spam users in collaborative filtering

Bhaskar Mehta; Thomas Hofmann; Peter Fankhauser

Collaborative Filtering systems are essentially social systems which base their recommendation on the judgment of a large number of people. However, like other social systems, they are also vulnerable to manipulation by malicious social elements. Lies and Propaganda may be spread by a malicious user who may have an interest in promoting an item, or downplaying the popularity of another one. By doing this systematically, with either multiple identities, or by involving more people, a few malicious user votes and profiles can be injected into a collaborative recommender system. This can significantly affect the robustness of a system or algorithm, as has been studied in recent work [5, 7]. While current detection algorithms are able to use certain characteristics of spam profiles to detect them, they suffer from low precision, and require a large amount of training data. In this work, we provide a simple unsupervised algorithm, which exploits statistical properties of effective spam profiles to provide a highly accurate and fast algorithm for detecting spam.

8th Conference of the American-Association-for-Corpus-Linguistics | 2010

Exploring a corpus of scientific texts using data mining

Elke Teich; Peter Fankhauser

We report on a project investigating the linguistic properties of English scientific texts on the basis of a corpus of journal articles from nine academic disciplines. The goal of the project is to gain insights on registers emerging at the boundaries of computer science and some other discipline (e.g., bioinformatics, computational linguistics, computational engineering). The questions we focus on in this paper are (a) how characteristic is the corpus of the meta-register it represents, and (b) how different/similar are the subcorpora in terms of the more specific registers they instantiate? We analyze the corpus using several data-mining techniques, including feature ranking, clustering, and classification, to see how the subcorpora group in terms of selected linguistic features. The results show that our corpus is well distinguished in terms of the meta-register of scientific writing; also, we find interesting distinctive features for the subcorpora as indicators of register diversification. Apart from presenting the results of our analyses, we will also reflect upon and assess the use of data mining for the tasks of corpus exploration and analysis.

association for information science and technology | 2016

The linguistic construal of disciplinarity: A data-mining approach using register features

Elke Teich; Stefania Degaetano-Ortlieb; Peter Fankhauser; Hannah Kermes; Ekaterina Lapshinova-Koltunski

We analyze the linguistic evolution of selected scientific disciplines over a 30‐year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use—both individually and collectively—over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus‐based methods of feature extraction (various aggregated features [part‐of‐speech based], n‐grams, lexico‐grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.

Archive | 2004