Hany Azzam
Queen Mary University of London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hany Azzam.
very large data bases | 2008
Thomas Roelleke; Hengzhi Wu; Jun Wang; Hany Azzam
This paper presents a probabilistic relational modelling (implementation) of the major probabilistic retrieval models. Such a high-level implementation is useful since it supports the ranking of any object, it allows for the reasoning across structured and unstructured data, and it gives the software (knowledge) engineer control over ranking and thus supports customisation. The contributions of this paper include the specification of probabilistic SQL (PSQL) and probabilistic relational algebra (PRA), a new relational operator for probability estimation (the relational Bayes), the probabilistic relational modelling of retrieval models, a comparison of modelling retrieval with traditional SQL versus modelling retrieval with PSQL, and a comparison of the performance of probability estimation with traditional SQL versus PSQL. The main findings are that the PSQL/PRA paradigm allows for the description of advanced retrieval models, is suitable for solving large-scale retrieval tasks, and outperforms traditional SQL in terms of abstraction and performance regarding probability estimation.
Proceedings of the Third International Workshop on Keyword Search on Structured Data | 2012
Hany Azzam; Sirvan Yahyaei; Marco Bonzanini; Thomas Roelleke
In order to search across factual knowledge and content explicated using different data formats this paper leverages a generic data model (schema) that transforms keyword-based retrieval models and queries to knowledge-oriented models and semantically-expressive queries. As each of the transformed retrieval models capitalises on a specific evidence space (term, classification, relationship and attribute), we demonstrate two possible combinations of these spaces, namely macro-based or micro-based. For bare keyword-based queries we demonstrate how the data model can be used to augment the queries with classifications, relationships, etc. that reflect the underlying constraints and objects found in the heterogeneous knowledge bases. Using the IMDb benchmark the results demonstrate the feasibility and effectiveness of the instantiated retrieval models and the query reformulation process.
exploiting semantic annotations in information retrieval | 2010
Hany Azzam; Thomas Roelleke
We introduce a query rating scheme that identifies the possible interpretations which can be assigned to a semantic query. The interpretations range from the traditional bag-of-words interpretation to more context- and semantic-aware interpretations. The aims of this scheme are to communicate the extent of semantics that is being interpreted for a query and to assign suitable query processing methods for each level of interpretation accordingly.
patent information retrieval | 2009
Iraklis A. Klampanos; Hany Azzam; Thomas Roelleke
Patent retrieval has emerged as an important application of information retrieval. Inherent properties of patent searching, such as large corpora, document length and the use of terminology have created the need for alternative approaches to searching. Logic-based information retrieval, as it is modelled by DB+IR systems, can accommodate these needs through its power of abstraction and the use of database-friendly query languages. However, there is a trade-off between expressiveness and efficiency. We propose to tackle such efficiency issues through distribution and parallelisation. In this paper we present our arguments in favour of a parallelised patent searching solution built on top of a probabilistic DB+IR system. Our contributions are both conceptual as well as technical. We demonstrate the flexibility of this approach by modelling two resource selection algorithms in probabilistic logic, expressed in probabilistic Datalog -- a rule-based language designed for expressing database-related tasks. Then, we provide early experimental indications which support the feasibility and technical soundness of this approach.
international conference on the theory of information retrieval | 2011
Hany Azzam; Thomas Roelleke
Database technology offers design methodologies to rapidly develop and deploy applications that are easy to understand, document and teach. It can be argued that information retrieval (IR) lacks equivalent methodologies. This poster discusses a generic data model, the Probabilistic Object-Oriented Content Model, that facilitates solving complex IR tasks. The model guides how data and queries are represented and how retrieval strategies are built and customised. Application/task-specific schemas can also be derived from the generic model. This eases the process of tailoring search to a specific task by offering a layered architecture and well-defined schema mappings. Different types of knowledge (facts and content) from varying data sources can also be consolidated into the proposed modelling framework. Ultimately, the data model paves the way for discussing IR-tailored design methodologies.
information retrieval facility conference | 2010
Iraklis A. Klampanos; Hengzhi Wu; Thomas Roelleke; Hany Azzam
Patent searching is a complex retrieval task. An initial document search is only the starting point of a chain of searches and decisions that need to be made by patent searchers. Keyword-based retrieval is adequate for document searching, but it is not suitable for modelling comprehensive retrieval strategies. DB-like and logical approaches are the state-of-the-art techniques to model strategies, reasoning and decision making. In this paper we present the application of logical retrieval to patent searching. The two grand challenges are expressiveness and scalability, where high degree of expressiveness usually means a loss in scalability. In this paper we report how to maintain scalability while offering the expressiveness of logical retrieval required for solving patent search tasks. We present logical retrieval background, and how to model data-source selection and results’ fusion. Moreover, we demonstrate the modelling of a retrieval strategy, a technique by which patent professionals are able to express, store and exchange their strategies and rationales when searching patents or when making decisions. An overview of the architecture and technical details complement the paper, while the evaluation reports preliminary results on how query processing times can be guaranteed, and how quality is affected by trading off responsiveness.
patent information retrieval | 2011
Hany Azzam; Iraklis A. Klampanos; Thomas Roelleke
Patent retrieval has emerged as an important application of information retrieval (IR). It is considered to be a complex search task because patent search requires an extended chain of reasoning beyond basic document retrieval. As logic-based IR is capable of modelling both document retrieval and decision-making, it can be seen as a suitable framework for modelling patent data and search strategies. In particular, we demonstrate logic-based modelling for semantic data in patent documents and retrieval strategies which are tailored to patent search and exploit more than just the text in the documents. Given the expressiveness of logic-based IR, however, there is an attendant compromise on issues of scalability and quality. To address these trade-offs we suggest how a parallelised architecture can ensure that logical IR scales in spite of its expressiveness.
conference on information and knowledge management | 2011
Hany Azzam; Thomas Roelleke; Sirvan Yahyaei
A growing number of applications are built on top of search engines and issue complex structured queries. This paper contributes a customisable ranking-based processing of such queries, specifically SQL. Similar to how term-based statistics are exploited by term-based retrieval models, ranking-aware processing of SQL queries exploits tuple-based statistics that are derived from sources or, more precisely, derived from the relations specified in the SQL query. To implement this ranking-based processing, we leverage PSQL, a probabilistic variant of SQL, to facilitate probability estimation and the generalisation of document retrieval models to be used for tuple retrieval. The result is a general-purpose framework that can interpret any SQL query and then assign a probabilistic retrieval model to rank the results of that query. The evaluation on the IMDB and Monster benchmarks proves that the PSQL-based approach is applicable to (semi-)structured and unstructured data and structured queries.
LWA | 2013
Thomas Roelleke; Hany Azzam; Marco Bonzanini; Miguel Martinez-Alvarez; Mounia Lalmas
LWA | 2010
Hany Azzam; Thomas Roelleke