Is this you? Create Your Porfile

Martin Rajman

École Polytechnique Fédérale de Lausanne

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Martin Rajman is active.

Explore More

Publication

Featured researches published by Martin Rajman.

european conference on principles of data mining and knowledge discovery | 1998

Text Mining at the Term Level

Ronen Feldman; Moshe Fresko; Yakkov Kinar; Yehuda Lindell; Orly Liphstat; Martin Rajman; Yonatan Schler; Oren Zamir

Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. Previous work in text mining focused at the word or the tag level. This paper presents an approach to performing text mining at the term level. The mining process starts by preprocessing the document collection and extracting terms from the documents. Each document is then represented by a set of terms and annotations characterizing the document. Terms and additional higher-level entities are then organized in a hierarchical taxonomy. In this paper we will describe the Term Extraction module of the Document Explorer system, and provide experimental evaluation performed on a set of 52,000 documents published by Reuters in the years 1995–1996.

international conference on data engineering | 2007

Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys

Ivana Podnar; Martin Rajman; Toan Luu; Fabius Klemm; Karl Aberer

The suitability of peer-to-peer (P2P) approaches for full-text Web retrieval has recently been questioned because of the claimed unacceptable bandwidth consumption induced by retrieval from very large document collections. In this contribution we formalize a novel indexing/retrieval model that achieves high performance, cost-efficient retrieval by indexing with highly discriminative keys (HDKs) stored in a distributed global index maintained in a structured P2P network. HDKs correspond to carefully selected terms and term sets appearing in a small number of collection documents. We provide a theoretical analysis of the scalability of our retrieval model and report experimental results obtained with our HDK-based P2P retrieval engine. These results show that, despite increased indexing costs, the total traffic generated with the HDK approach is significantly smaller than the one obtained with distributed single-term indexing strategies. Furthermore, our experiments show that the retrieval performance obtained with a random set of real queries is comparable to the one of centralized, single-term solution using the best state-of-the-art BM25 relevance computation scheme. Finally, our scalability analysis demonstrates that the HDK approach can scale to large networks of peers indexing Web-size document collections, thus opening the way towards viable, truly-decentralized Web retrieval.

Proc. of EUROSTAT Conference | 1998

Text Mining - Knowledge extraction from unstructured textual data

Martin Rajman; Romaric Besançon

In the general context of Knowledge Discovery, specific techniques, called Text Mining techniques, are necessary to extract information from unstructured textual data. The extracted information can then be used for the classification of the content of large textual bases. In this paper, we present two examples of information that can be automatically extracted from text collections: probabilistic associations of key-words and prototypical document instances. The Natural Language Processing (NLP) tools necessary for such extractions are also presented.

international acm sigir conference on research and development in information retrieval | 2007

Web text retrieval with a P2P query-driven index

Gleb Skobeltsyn; Toan Luu; Ivana Podnar Zarko; Martin Rajman; Karl Aberer

In this paper, we present a query-driven indexing/retrieval strategy for efficient full text retrieval from large document collections distributed within a structured P2P network. Our indexing strategy is based on two important properties: (1) the generated distributed index stores posting lists for carefully chosen indexing term combinations, and (2) the posting lists containing too many document references are truncated to a bounded number of their top-ranked elements. These two properties guarantee acceptable storage and bandwidth requirements, essentially because the number of indexing term combinations remains scalable and the transmitted posting lists never exceed a constant size. However, as the number of generated term combinations can still become quite large, we also use term statistics extracted from available query logs to index only such combinations that are frequently present in user queries. Thus, by avoiding the generation of superfluous indexing term combinations, we achieve an additional substantial reduction in bandwidth and storage consumption. As a result, the generated distributed index corresponds to a constantly evolving query-driven indexing structure that efficiently follows current information needs of the users. More precisely, our theoretical analysis and experimental results indicate that, at the price of a marginal loss in retrieval quality for rare queries, the generated index size and network traffic remain manageable even for web-size document collections. Furthermore, our experiments show that at the same time the achieved retrieval quality is fully comparable to the one obtained with a state-of-the-art centralized query engine.

information retrieval in peer to peer networks | 2006

ALVIS peers: a scalable full-text peer-to-peer retrieval engine

Toan Luu; Fabius Klemm; Ivana Podnar; Martin Rajman; Karl Aberer

We present Alvis peers, a full-text P2P retrieval engine designed to offer retrieval performance comparable to centralized solutions while scaling to a very large number of peers. It is the result of our research efforts within the project Alvis1 European FP 6 STREP project ALVIS, http://www.alvis.info/ that aims at building a truly-distributed semantic search engine. To cope with problem of unscalable bandwidth consumption in the P2P network, the engine implements a novel retrieval model that indexes highly-discriminative keys (HDKs)---terms and term sets appearing in a limited number of collection documents. Our prototype is a fully-functional retrieval engine built over a structured P2P network. It includes a component for HDK based indexing and retrieval, and a distributed content-based ranking module. Such an integrated system represents a substantial contribution to the design and development of realistic P2P retrieval systems.

scalable information systems | 2007

Query-driven indexing for scalable peer-to-peer text retrieval

Gleb Skobeltsyn; Toan Luu; Ivana Podnar Žarko; Martin Rajman; Karl Aberer

We present a query-driven algorithm for the distributed indexing of large document collections within structured P2P networks. To cope with bandwidth consumption that has been identified as the major problem for the standard P2P approach with single term indexing, we leverage a distributed index that stores up to top-k document references only for carefully chosen indexing term combinations. In addition, since the number of possible term combinations extracted from a document collection can be very large, we propose to use query statistics to index only such combinations that are indeed frequently requested by the users. Thus, by avoiding the maintenance of superfluous indexing information, we achieve a substantial reduction in bandwidth and storage. A specific activation mechanism is applied to continuously update the indexing information according to changes in the query distribution, resulting in an efficient, constantly evolving query-driven indexing structure. We show that the size of the index and the generated indexing/retrieval traffic remains manageable even for web-size document collections at the price of a marginal loss in precision for rare queries. Our theoretical analysis and experimental results provide convincing evidence about the feasibility of the query-driven indexing strategy for large scale P2P text retrieval. Moreover, our experiments confirm that the retrieval performance is only slightly lower than the one obtained with state-of-the-art centralized query engines.

field programmable custom computing machines | 2000

An FPGA-based coprocessor for the parsing of context-free grammars

Cristian Ciressan; Eduardo Sanchez; Martin Rajman; Jean-Cédric Chappelier

This paper presents an FPGA-based implementation of a co-processing unit able to parse context-free grammars of real-life sizes. The application fields of such a parser range from programming language syntactic analysis to very demanding natural language applications where parsing speed is an important issue.

text speech and dialogue | 2004

Rapid Dialogue Prototyping Methodology

Trung H. Bui; Martin Rajman; Miroslav Melichar

This paper is about the automated production of dialogue models. The goal is to propose and validate a methodology that allows the production of finalized dialogue models (i.e. dialogue models specific for given applications) in a few hours. The solution we propose for such a methodology, called the Rapid Dialogue Prototyping Methodology (RDPM), is decomposed into five consecutive main steps, namely: (1) producing the task model; (2) deriving the initial dialogue model; (3) using a Wizard-of-Oz experiment to instantiate the initial dialogue model; (4) using an internal field test to refine the dialogue model; and (5) using an external field test to evaluate the final dialogue model. All five steps will be described in more detail in the document.

international conference on machine learning | 2004

ARCHIVUS: a system for accessing the content of recorded multimodal meetings

Agnes Lisowska; Martin Rajman; Trung H. Bui

This paper describes a multimodal dialogue driven system, ARCHIVUS, that allows users to access and retrieve the content of recorded and annotated multimodal meetings. We describe (1) a novel approach taken in designing the system given the relative inapplicability of standard user requirements elicitation methodologies, (2) the components of ARCHIVUS, and (3) the methodologies that we plan to use to evaluate the system.

very large data bases | 2008

AlvisP2P: scalable peer-to-peer text retrieval in a structured P2P network

Toan Luu; Gleb Skobeltsyn; Fabius Klemm; Maroje Puh; Ivana Podnar Žarko; Martin Rajman; Karl Aberer

In this paper we present the AlvisP2P IR engine, which enables efficient retrieval with multi-keyword queries from a global document collection available in a P2P network. In such a network, each peer publishes its local index and invests a part of its local computing resources (storage, CPU, bandwidth) to maintain a fraction of a global P2P index. This investment is rewarded by the network-wide accessibility of the local documents via the global search facility. The AlvisP2P engine uses an optimized overlay network and relies on novel indexing/retrieval mechanisms that ensure low bandwidth consumption, thus enabling unlimited network growth. Our demonstration shows how an easy-to-install AlvisP2P client can be used to join an existing P2P network, index local (text or even multimedia) documents with collection-specific indexing mechanisms, and control access rights to them.

Explore More