Miguel Martinez-Alvarez

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Miguel Martinez-Alvarez is active.

Explore More

Publication

Featured researches published by Miguel Martinez-Alvarez.

international acm sigir conference on research and development in information retrieval | 2013

Extractive summarisation via sentence removal: condensing relevant sentences into a short summary

Marco Bonzanini; Miguel Martinez-Alvarez; Thomas Roelleke

Many on-line services allow users to describe their opinions about a product or a service through a review. In order to help other users to find out the major opinion about a given topic, without the effort to read several reviews, multi-document summarisation is required. This research proposes an approach for extractive summarisation, supporting different scoring techniques, such as cosine similarity or divergence, as a method for finding representative sentences. The main contribution of this paper is the definition of an algorithm for sentence removal, developed to maximise the score between the summary and the original document. Instead of ranking the sentences and selecting the most important ones, the algorithm iteratively removes unimportant sentences until a desired compression rate is reached. Experimental results show that variations of the sentence removal algorithm provide good performance.

international acm sigir conference on research and development in information retrieval | 2012

Opinion summarisation through sentence extraction: an investigation with movie reviews

Marco Bonzanini; Miguel Martinez-Alvarez; Thomas Roelleke

In on-line reviews, authors often use a short passage to describe the overall feeling about a product or a service. A review as a whole can mention many details not in line with the overall feeling, so capturing this key passage is important to understand the overall sentiment of the review. This paper investigates the use of extractive summarisation in the context of sentiment classification. The aim is to find the summary sentence, or the short passage, which gives the overall sentiment of the review, filtering out potential noisy information. Experiments on a movie review data-set show that subjectivity detection plays a central role in building summaries for sentiment classification. Subjective extracts carry the same polarity of the full text reviews, while statistical and positional approaches are not able to capture this aspect.

data warehousing and knowledge discovery | 2013

Document Difficulty Framework for Semi-automatic Text Classification

Miguel Martinez-Alvarez; Alejandro Bellogín; Thomas Roelleke

Text Classification systems are able to deal with large datasets, spending less time and human cost compared with manual classification. This is achieved, however, in expense of loss in quality. Semi-Automatic Text Classification SATC aims to achieve high quality with minimum human effort by ranking the documents according to their estimated certainty of being correctly classified. This paper introduces the Document Difficulty Framework DDF, a unification of different strategies to estimate the document certainty, and its application to SATC. DDF exploits the scores and thresholds computed by any given classifier. Different metrics are obtained by changing the parameters of the three levels the framework is lied upon: how to measure the confidence for each document-class evidence, which classes to observe class and how to aggregate this knowledge aggregation. Experiments show that DDF metrics consistently achieve high error reduction with large portions of the collection being automatically classified. Furthermore, DDF outperforms all the reported SATC methods in the literature.

international workshop on ranking in databases | 2013

On the modelling of ranking algorithms in probabilistic datalog

Thomas Roelleke; Marco Bonzanini; Miguel Martinez-Alvarez

TF-IDF, BM25, language modelling (LM), and divergence-from-randomness (DFR) are popular ranking models. Providing logical abstraction for information search is important, but the implementation of ranking algorithms in logical abstraction layers such as probabilistic Datalog leads to many challenges regarding expressiveness and scalability. Though the ranking algorithms have probabilistic roots, the ranking score often is not probabilistic, leading to unsafe programs from a probabilistic point of view. In this paper, we describe the evolution of probabilistic Datalog to provide concepts required for modelling ranking algorithms.

scalable uncertainty management | 2010

Modelling probabilistic inference networks and classification in probabilistic datalog

Miguel Martinez-Alvarez; Thomas Roelleke

Probabilistic Graphical Models (PGM) are a well-established approach for modelling uncertain knowledge and reasoning. Since we focus on inference, this paper explores Probabilistic Inference Networks (PINs) which are a special case of PGM. PINs, commonly referred as Bayesian Networks, are used in Information Retrieval to model tasks such as classification and ad-hoc retrieval. Intuitively, a probabilistic logical framework such as Probabilistic Datalog (PDatalog) should provide the expressiveness required to model PINs. However, this modelling turned out to be more challenging than expected, requiring to extend the expressiveness of PDatalog. Also, for IR and when modelling more general tasks, it turned out that 1st generation PDatalog has expressiveness and scalability bottlenecks. Therefore, this paper makes a case for 2nd generation PDatalog which supports the modelling of PINs. In addition, the paper reports the implementation of a particular PIN application: Bayesian Classifiers to investigate and demonstrate the feasibility of the proposed approach.

european conference on information retrieval | 2016

First International Workshop on Recent Trends in News Information Retrieval (NewsIR’16)

Miguel Martinez-Alvarez; Udo Kruschwitz; Gabriella Kazai; Frank Hopfgartner; David Corney; Ricardo Campos; Dyaa Albakour

The news industry has gone through seismic shifts in the past decade with digital content and social media completely redefining how people consume news. Readers check for accurate fresh news from multiple sources throughout the day using dedicated apps or social media on their smartphones and tablets. At the same time, news publishers rely more and more on social networks and citizen journalism as a frontline to breaking news. In this new era of fast-flowing instant news delivery and consumption, publishers and aggregators have to overcome a great number of challenges. These include the verification or assessment of a source’s reliability; the integration of news with other sources of information; real-time processing of both news content and social streams in multiple languages, in different formats and in high volumes; deduplication; entity detection and disambiguation; automatic summarization; and news recommendation. Although Information Retrieval (IR) applied to news has been a popular research area for decades, fresh approaches are needed due to the changing type and volume of media content available and the way people consume this content. The goal of this workshop is to stimulate discussion around new and powerful uses of IR applied to news sources and the intersection of multiple IR tasks to solve real user problems. To promote research efforts in this area, we released a new dataset consisting of one million news articles to the research community and introduced a data challenge track as part of the workshop.

european conference on information retrieval | 2015

Signal: Advanced Real-Time Information Filtering

Miguel Martinez-Alvarez; Udo Kruschwitz; Wesley Hall; Massimo Poesio

The overload of textual information is an ever-growing problem to be addressed by modern information filtering systems, not least because strategic decisions are heavily influenced by the news of the world. In particular, business opportunities as well as threats can arise by using up-to-date information coming from disparate sources such as articles published by global news providers but equally those found in local newspapers or relevant blogposts. Common media monitoring approaches tend to rely on large-scale, manually created boolean queries. However, in order to be effective and flexible in a business environment, user information needs require complex, adaptive representations that go beyond simple keywords. This demonstration illustrates the approach to the problem that Signal takes: a cloud-based architecture that processes and analyses, in real-time, all the news of the world and allows its users to specify complex information requirements based on entities, topics, industry-specific terminology and keywords.

international conference on the theory of information retrieval | 2013

Mathematical Specification and Logic Modelling in the context of IR

Miguel Martinez-Alvarez; Marco Bonzanini; Thomas Roelleke

Many IR models and tasks rely on a mathematical specification, and, in order to check its correctness, extensive testing and manual inspection is usually carried out. However, a formal guarantee can be particularly difficult, or even impossible, to provide. This poster highlights the relationship between the mathematical specification of IR algorithms and their modelling, using a logic-based abstraction that minimises the gap between the specification and a concrete implementation. As a result, the semantics of the program are well-defined and correctness checks can be applied. This methodology is illustrated with the mathematical specification, and logic modelling of a Bayesian classifier with Laplace smoothing. In addition to closing the gap between specification and modelling, and the fact that checking the correctness of a models implementation becomes an inherent part of the design process, this work can lead to the automatic translation between the mathematical definition and its modelling.

international conference on the theory of information retrieval | 2011

A descriptive approach to classification

Miguel Martinez-Alvarez; Thomas Roelleke

Nowadays information systems are required to be more adaptable and flexible than before to deal with the rapidly increasing quantity of available data and changing information needs. Text Classification (TC) is a useful task that can help to solve different problems in different fields. This paper investigates the application of descriptive approaches for modelling classification. The main objectives are increasing abstraction and flexibility so that expert users are able to customise specific strategies for their needs. The contribution of this paper is two-fold. Firstly, it illustrates that the modelling of classifiers in a descriptive approach is possible and it leads to a close definition w.r.t. mathematical formulations. Moreover, the automatic translation from PDatalog to mathematical formulation is discussed. Secondly, quality and efficiency results prove the approach feasibility for real-scale collections.

NewsIR@ECIR | 2016