Joel Azzopardi
University of Malta
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joel Azzopardi.
Algorithms | 2012
Joel Azzopardi; Chris Staff
When an event occurs in the real world, numerous news reports describing this event start to appear on different news sites within a few minutes of the event occurrence. This may result in a huge amount of information for users, and automated processes may be required to help manage this information. In this paper, we describe a clustering system that can cluster news reports from disparate sources into event-centric clusters—i.e., clusters of news reports describing the same event. A user can identify any RSS feed as a source of news he/she would like to receive and our clustering system can cluster reports received from the separate RSS feeds as they arrive without knowing the number of clusters in advance. Our clustering system was designed to function well in an online incremental environment. In evaluating our system, we found that our system is very good in performing fine-grained clustering, but performs rather poorly when performing coarser-grained clustering.
database and expert systems applications | 2015
Chris Staff; Joel Azzopardi; Colin Layfield; Daniel Mercieca
Our unsupervised Search Results Clustering (SRC) system partitions into clusters the top-n results returned by a search engine. We present the results of experiments with our SRC system that performs incremental clustering on document titles and snippets only and does not use external resources, yet which outperforms the best performers to date on the SemEval-2013 Task 11 gold standard. We include Latent Semantic Analysis (LSA) as an optional step, using the snippets themselves as the background corpus. We demonstrate that better results are achieved by leaving the query terms out of the clustering process, and that currently, the version without LSA outperforms the version with LSA.
advanced information networking and applications | 2012
Joel Azzopardi; Chris Staff
Events occurring in the real world are covered by news reports from different sources. Each report generally contains information that is found in others, but may also contain unique information. To learn all the information about a particular event, a user will need to read all the different reports. This is a duplication of effort since most information will be repeated in the different reports. In our research, we attempt to fuse news reports about the same event into a single coherent document eliminating repetition but preserving all the information contained in the source reports using only surface-based methods. Information in each news report is represented by a set of entity relationship graphs. The graphs representing each report are then merged into a single graph whilst keeping track of the source sentences. The fused report is generated using the maximally expressive set of sentences -- the sentences that carry most information about the entities and their relationships in the news report, and ensuring that all entities and relationships are expressed in the fused document. Our Document fusion system was evaluated using a set of news reports downloaded from MSNBC News that cite their sources, and also using human evaluation. We show that our system is able to capture most of the information found across different source documents whilst maintaining readability.
Semanitic Keyword-based Search on Structured Data Sources | 2016
Joel Azzopardi; Fabio Benedetti; Francesco Guerra; Mihai Lupu
We reproduce recent research results combining semantic and information retrieval methods. Additionally, we expand the existing state of the art by combining the semantic representations with IR methods from the probabilistic relevance framework. We demonstrate a significant increase in performance, as measured by standard evaluation metrics.
Semanitic Keyword-based Search on Structured Data Sources | 2016
Colin Layfield; Joel Azzopardi; Chris Staff
Users face the Vocabulary Gap problem when attempting to retrieve relevant textual documents from small databases, especially when there are only a small number of relevant documents, as it is likely that different terms are used in queries and relevant documents to describe the same concept. To enable comparison of results of different approaches to semantic search in small textual databases, the PIKES team constructed an annotated test collection and Gold Standard comprising 35 search queries and 331 articles. We present two different possible solutions. In one, we index an unannotated version of the PIKES collection using Latent Semantic Analysis (LSA) retrieving relevant documents using a combination of query coordination and automatic relevance feedback. Although we outperform prior work, this approach is dependent on the underlying collection, and is not necessarily scalable. In the second approach, we use an LSA Model generated by SEMILAR from a Wikipedia dump to generate a Term Similarity Matrix (TSM). Queries are automatically expanded with related terms from the TSM and are submitted to a term-by-document matrix Vector Space Model of the PIKES collection. Coupled with a combination of query coordination and automatic relevance feedback we also outperform prior work with this approach. The advantage of the second approach is that it is independent of the underlying document collection.
practical applications of agents and multi agent systems | 2012
Joel Azzopardi; Chris Staff
The multitude of news reports being published on the WWW may cause information overload on users. In this paper, we describe a news recommendation system whereby news reports are represented using entity-relationship graphs, and the users’ interaction with these news reports in a specialised web portal is monitored in order to construct and maintain user models that store the user’s reading history and also define entities that appear to be of interest to the user. These user models are used to alert individual users when an event has occurred that falls within their area of interest, and to present news reports to users in an adaptive manner – previously seen information is shown in a summarised form. We evaluated our recommendation system using a corpus of news reports downloaded from Yahoo! News. Results obtained indicate that our recommendation system performs better than the baseline system that uses the Rocchio algorithm without negative feedback.
Semanitic Keyword-based Search on Structured Data Sources | 2016
Joel Azzopardi; Dragan Ivanović; Georgia M. Kapitsaki
Digital libraries have become an excellent information resource for researchers. However, users of digital libraries would be served better by having the relevant items ‘pushed’ to them. In this research, we present various automatic recommendation systems to be used in a digital library of Serbian PhD Dissertations. We experiment with the use of Latent Semantic Analysis (LSA) in both content and collaborative recommendation approaches, and evaluate the use of different similarity functions. We find that the best results are obtained when using a collaborative approach that utilises LSA and Pearson similarity.
Remote Sensing of the Ocean, Sea Ice, Coastal Waters, and Large Water Regions 2011 | 2011
Alan Deidun; Aldo Drago; Adam Gauci; Anthony Galea; Joel Azzopardi; F. Melin
The study of spatio-temporal trends for key water quality parameters in the Maltese coastal waters is hindered by the lack of systematic observations spanning over the full domain and for sufficiently long time periods. Satellite data offers an alternative source of information, but requires ground truthing against in situ measurements. The aim of this study is to attempt the statistical comparison of MODIS ocean colour data, for a near-shore marine area off the north-east coastline of Malta, with in situ surface chlorophyll-a measurements, and to extract a twelve-month ocean colour data series for the same marine area. Peaks in surface chlorophyll-a concentration occurred in the January-February period, with lowest values being recorded during the early spring period. Log bias values indicate that the MODIS dataset under-estimates the surface chlorophyll-a values, whilst RMSD and r2 values suggest that the match-up between satellite and in situ values is only partly consistent.
Semanitic Keyword-based Search on Structured Data Sources | 2017
Joel Azzopardi
The use of personalised recommendation systems to push interesting items to users has become a necessity in the digital world that contains overwhelming amounts of information. One of the most effective ways to achieve this is by considering the opinions of other similar users – i.e. through collaborative techniques. In this paper, we compare the performance of item-based and user-based recommendation algorithms as well as propose an ensemble that combines both systems. We investigate the effect of applying LSA, as well as varying the neighbourhood size on the different algorithms. Finally, we experiment with the inclusion of content-type information in our recommender systems. We find that the most effective system is the ensemble system that uses LSA.
Semanitic Keyword-based Search on Structured Data Sources | 2017
Colin Layfield; Dragan Ivanović; Joel Azzopardi
One of the challenges in information retrieval is attempting to search a corpus of documents that may contain multiple languages. This exploratory study expands upon earlier research employing Latent Semantic Analysis (so called Multi-Lingual Latent Semantic Indexing, or ML-LSI/LSA). We experiment using this approach, and a new one, in a multi-lingual context utilising two similar languages, namely Serbian and Croatian. Traditionally, with an LSA approach, a parallel corpus would be needed in order to train the system by combining identical documents in two languages into one document. We repeat that approach and also experiment with creating a semantic space using the parallel corpus on its own without merging the documents together to test the hypothesis that, with very similar languages, the merging of documents may not be required for good results.