Arlind Kopliku
University of Toulouse
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Arlind Kopliku.
ACM Computing Surveys | 2014
Arlind Kopliku; Karen Pinel-Sauvagnat; Mohand Boughanem
Traditional search engines return ranked lists of search results. It is up to the user to scroll this list, scan within different documents, and assemble information that fulfill his/her information need. Aggregated search represents a new class of approaches where the information is not only retrieved but also assembled. This is the current evolution in Web search, where diverse content (images, videos, etc.) and relational content (similar entities, features) are included in search results. In this survey, we propose a simple analysis framework for aggregated search and an overview of existing work. We start with related work in related domains such as federated search, natural language generation, and question answering. Then we focus on more recent trends, namely cross vertical aggregated search and relational aggregated search, which are already present in current Web search.
international conference on enterprise information systems | 2015
Max Chevalier; Mohammed El Malki; Arlind Kopliku; Olivier Teste; Ronan Tournier
Not only SQL (NoSQL) databases are becoming increasingly popular and have some interesting strengths such as scalability and flexibility. In this paper, we investigate on the use of NoSQL systems for implementing OLAP (On-Line Analytical Processing) systems. More precisely, we are interested in instantiating OLAP systems (from the conceptual level to the logical level) and instantiating an aggregation lattice (optimization). We define a set of rules to map star schemas into two NoSQL models: column-oriented and document-oriented. The experimental part is carried out using the reference benchmark TPC. Our experiments show that our rules can effectively instantiate such systems (star schema and lattice). We also analyze differences between the two NoSQL systems considered. In our experiments, HBase (column-oriented) happens to be faster than MongoDB (document-oriented) in terms of loading time.
conference on information and knowledge management | 2011
Arlind Kopliku; Mohand Boughanem; Karen Pinel-Sauvagnat
In this paper, we propose an attribute retrieval approach which extracts and ranks attributes from HTML tables. We distinguish between class attribute retrieval and instance attribute retrieval. On one hand, given an instance (e.g. University of Strathclyde) we retrieve from the Web its attributes (e.g. principal, location, number of students). On the other hand, given a class (e.g. universities) represented by a set of instances, we retrieve common attributes of its instances. Furthermore, we show we can reinforce instance attribute retrieval if similar instances are available. Our approach uses HTML tables which are probably the largest source for attribute retrieval. Three recall oriented filters are applied over tables to check the following three properties: (i) is the table relational, (ii) has the table a header, and (iii) the conformity of its attributes and values. Candidate attributes are extracted from tables and ranked with a combination of relevance features. Our approach is shown to have a high recall and a reasonable precision. Moreover, it outperforms state of the art techniques.
research challenges in information science | 2015
Max Chevalier; Mohammed El Malki; Arlind Kopliku; Olivier Teste; Ronan Tournier
The plethora of data warehouse solutions has created a need comparing these solutions using experimental benchmarks. Existing benchmarks rely mostly on the relational data model and do not take into account other models. In this paper, we propose an extension to a popular benchmark (the Star Schema Benchmark or SSB) that considers non-relational NoSQL models. To avoid data post-processing required for using this data with NoSQL systems, the data is generated in different formats. To exploit at best horizontal scaling, data can be produced in a distributed file system, hence removing disk or partition sizes as limit for the generated dataset. Experimental work proves improved performance of our new benchmark.
advances in databases and information systems | 2015
Max Chevalier; Mohammed El Malki; Arlind Kopliku; Olivier Teste; Ronan Tournier
NoSQL (Not Only SQL) systems are becoming popular due to known advantages such as horizontal scalability and elasticity. In this paper, we study the implementation of multidimensional data warehouses with columnoriented NoSQL systems. We define mapping rules that transform the conceptual multidimensional data model to logical column-oriented models. We consider three different logical models and we use them to instantiate data warehouses. We focus on data loading, model-to-model conversion and OLAP cuboid computation.
international conference on enterprise information systems | 2015
Max Chevalier; Mohammed El Malki; Arlind Kopliku; Olivier Teste; Ronan Tournier
The traditional OLAP (On-Line Analytical Processing) systems store data in relational databases. Unfortunately, it is difficult to manage big data volumes with such systems. As an alternative, NoSQL systems (Not-only SQL) provide scalability and flexibility for an OLAP system. We define a set of rules to map star schemas and its optimization structure, a precomputed aggregate lattice, into two logical NoSQL models: column-oriented and document-oriented. Using these rules we analyse and implement two decision support systems, one for each model (using MongoDB and HBase).We compare both systems during the phases of data (generated using the TPC-DS benchmark) loading, lattice generation and querying.
international conference on big data | 2015
Max Chevalier; Mohammed El Malki; Arlind Kopliku; Olivier Teste; Ronan Tournier
NoSQL (Not Only SQL) systems are becoming popular due to known advantages such as horizontal scalability and elasticity. In this paper, we study the implementation of data warehouses with document-oriented NoSQL systems. We propose mapping rules that transform the multidimensional data model to logical document-oriented models. We consider three different logical models and we use them to instantiate data warehouses. We focus on data loading, model-to-model conversion and OLAP cuboid computation.
acm ieee joint conference on digital libraries | 2011
Arlind Kopliku; Karen Pinel-Sauvagnat; Mohand Boughanem
In this paper we propose an attribute retrieval approach which extracts and ranks attributes from Web tables. We combine simple heuristics to filter out improbable attributes and we rank attributes based on frequencies and a table match score. Ranking is reinforced with external evidence from Web search, DBPedia and Wikipedia. Our approach can be applied to whatever instance (e.g. Canada) to retrieve its attributes (capital, GDP). It is shown it has a much higher recall than DBPedia and Wikipedia and that it works better than lexico-syntactic rules for the same purpose.
web intelligence | 2011
Arlind Kopliku; Firas Damak; Karen Pinel-Sauvagnat; Mohand Boughanem
Major search engines perform what is known as Aggregated Search (AS). They integrate results coming from different vertical search engines (images, videos, news, etc.) with typical Web search results. Aggregated search is relatively new and its advantages need to be evaluated. Some existing works have already tried to evaluate the interest (usefulness) of aggregated search as well as the effectiveness of the existing approaches. However, most of evaluation methodologies were based (i) on what we call relevance by intent (i.e. search results were not shown to real users), and (ii) short text queries. In this paper, we conducted a user study which was designed to revisit and compare the interest of aggregated search, by exploiting both relevance by intent and content, and using both short text and fixed need queries. This user study allowed us to analyze the distribution of relevant results across different verticals, and to show that AS helps to identify complementary relevant sources for the same information need. Comparison between relevance by intent and relevance by content showed that relevance by intent introduces a bias in evaluation. Discussion about the results also allowed us to identify some useful thoughts concerning the evaluation of AS approaches.
international conference on enterprise information systems | 2016
Max Chevalier; Mohammed El Malki; Arlind Kopliku; Olivier Teste; Ronan Tournier
There is an increasing interest in NoSQL (Not Only SQL) systems developed in the area of Big Data as candidates for implementing multidimensional data warehouses due to the capabilities of data structuration/storage they offer. In this paper, we study implementation and modeling issues for data warehousing with document-oriented systems, a class of NoSQL systems. We study four different mappings of the multidimensional conceptual model to document data models. We focus on formalization and cross-model comparison. Experiments go through important features of data warehouses including data loading, OLAP cuboid computation and querying. Document-oriented systems are also compared to relational systems.