Djellel Eddine Difallah

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Djellel Eddine Difallah is active.

Explore More

Publication

Featured researches published by Djellel Eddine Difallah.

international world wide web conferences | 2012

ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking

Gianluca Demartini; Djellel Eddine Difallah; Philippe Cudré-Mauroux

We tackle the problem of entity linking for large collections of online pages; Our system, ZenCrowd, identifies entities from natural language text using state of the art techniques and automatically connects them to the Linked Open Data cloud. We show how one can take advantage of human intelligence to improve the quality of the links by dynamically generating micro-tasks on an online crowdsourcing platform. We develop a probabilistic framework to make sensible decisions about candidate links and to identify unreliable human workers. We evaluate ZenCrowd in a real deployment and show how a combination of both probabilistic reasoning and crowdsourcing techniques can significantly improve the quality of the links, while limiting the amount of work performed by the crowd.

international world wide web conferences | 2013

Pick-a-crowd: tell me what you like, and i'll tell you what to do

Djellel Eddine Difallah; Gianluca Demartini; Philippe Cudré-Mauroux

Crowdsourcing allows to build hybrid online platforms that combine scalable information systems with the power of human intelligence to complete tasks that are difficult to tackle for current algorithms. Examples include hybrid database systems that use the crowd to fill missing values or to sort items according to subjective dimensions such as picture attractiveness. Current approaches to Crowdsourcing adopt a pull methodology where tasks are published on specialized Web platforms where workers can pick their preferred tasks on a first-come-first-served basis. While this approach has many advantages, such as simplicity and short completion times, it does not guarantee that the task is performed by the most suitable worker. In this paper, we propose and extensively evaluate a different Crowdsourcing approach based on a push methodology. Our proposed system carefully selects which workers should perform a given task based on worker profiles extracted from social networks. Workers and tasks are automatically matched using an underlying categorization structure that exploits entities extracted from the task descriptions on one hand, and categories liked by the user on social platforms on the other hand. We experimentally evaluate our approach on tasks of varying complexity and show that our push methodology consistently yield better results than usual pull strategies.

symposium on cloud computing | 2014

Reservation-based Scheduling: If You're Late Don't Blame Us!

Carlo Curino; Djellel Eddine Difallah; Chris Douglas; Subru Krishnan; Raghu Ramakrishnan; Sriram Rao

The continuous shift towards data-driven approaches to business, and a growing attention to improving return on investments (ROI) for cluster infrastructures is generating new challenges for big-data frameworks. Systems originally designed for big batch jobs now handle an increasingly complex mix of computations. Moreover, they are expected to guarantee stringent SLAs for production jobs and minimize latency for best-effort jobs. In this paper, we introduce reservation-based scheduling, a new approach to this problem. We develop our solution around four key contributions: 1) we propose a reservation definition language (RDL) that allows users to declaratively reserve access to cluster resources, 2) we formalize planning of current and future cluster resources as a Mixed-Integer Linear Programming (MILP) problem, and propose scalable heuristics, 3) we adaptively distribute resources between production jobs and best-effort jobs, and 4) we integrate all of this in a scalable system named Rayon, that builds upon Hadoop / YARN. We evaluate Rayon on a 256-node cluster against workloads derived from Microsoft, Yahoo!, Facebook, and Cloud-eras clusters. To enable practical use of Rayon, we open-sourced our implementation as part of Apache Hadoop 2.6.

cloud data management | 2012

Benchmarking OLTP/web databases in the cloud: the OLTP-bench framework

Carlo Curino; Djellel Eddine Difallah; Andrew Pavlo; Philippe Cudré-Mauroux

Benchmarking is a key activity in building and tuning data management systems, but the lack of reference workloads and a common platform makes it a time consuming and painful task. The need for such a tool is heightened with the advent of cloud computing--with its pay-per-use cost models, shared multi-tenant infrastructures, and lack of control on system configuration. Benchmarking is the only avenue for users to validate the quality of service they receive and to optimize their deployments for performance and resource utilization. In this talk, we present our experience in building several adhoc benchmarking infrastructures for various research projects targeting several OLTP DBMSs, ranging from traditional relational databases, main-memory distributed systems, and cloud-based scalable architectures. We also discuss our struggle to build meaningful micro-benchmarks and gather workloads representative of real-world applications to stress-test our systems. This experience motivates the OLTP-Bench project, a batteries-included benchmarking infrastructure designed for and tested on several relational DBMSs and cloud-based database-as-a-service (DBaaS) offerings. OLTP-Bench is capable of controlling transaction rate, mixture, and workload skew dynamically during the execution of an experiment, thus allowing the user to simulate a multitude of practical scenarios that are typically hard to test (e.g., time-evolving access skew). Moreover, the infrastructure provides an easy way to monitor performance and resource consumption of the database under test. We also introduce the ten included workloads, derived from either synthetic micro benchmarks, popular benchmarks, and real world applications, and how they can be used to investigate various performance and resource-consumption characteristics of a data management system. We showcase the effectiveness of our benchmarking infrastructure and the usefulness of the workloads we selected by reporting sample results from hundreds of side-byside comparisons on popular DBMSs and DBaaS offerings.

international semantic web conference | 2015

SANAPHOR: Ontology-Based Coreference Resolution

Roman Prokofyev; Alberto Tonon; Michael Luggen; Loic Vouilloz; Djellel Eddine Difallah; Philippe Cudré-Mauroux

We tackle the problem of resolving coreferences in textual content by leveraging Semantic Web techniques. Specifically, we focus on noun phrases that coreference identifiable entities that appear in the text; the challenge in this context is to improve the coreference resolution by leveraging potential semantic annotations that can be added to the identified mentions. Our system, SANAPHOR, first applies state-of-the-art techniques to extract entities, noun phrases, and candidate coreferences. Then, we propose an approach to type noun phrases using an inverted index built on top of a Knowledge Graph e.g., DBpedia. Finally, we use the semantic relatedness of the introduced types to improve the state-of-the-art techniques by splitting and merging coreference clusters. We evaluate SANAPHOR on CoNLL datasets, and show how our techniques consistently improve the state of the art in coreference resolution.

international world wide web conferences | 2016

Scheduling Human Intelligence Tasks in Multi-Tenant Crowd-Powered Systems

Djellel Eddine Difallah; Gianluca Demartini; Philippe Cudré-Mauroux

Micro-task crowdsourcing has become a popular approach to effectively tackle complex data management problems such as data linkage, missing values, or schema matching. However, the backend crowdsourced operators of crowd-powered systems typically yield higher latencies than the machine-processable operators, this is mainly due to inherent efficiency differences between humans and machines. This problem can be further exacerbated by the lack of workers on the target crowdsourcing platform, or when the workers are shared unequally among a number of competing requesters; including the concurrent users from the same organization who execute crowdsourced queries with different types, priorities and prices. Under such conditions, a crowd-powered system acts mostly as a proxy to the crowdsourcing platform, and hence it is very difficult to provide effiency guarantees to its end-users. Scheduling is the traditional way of tackling such problems in computer science, by prioritizing access to shared resources. In this paper, we propose a new crowdsourcing system architecture that leverages scheduling algorithms to optimize task execution in a shared resources environment, in this case a crowdsourcing platform. Our study aims at assessing the efficiency of the crowd in settings where multiple types of tasks are run concurrently. We present extensive experimental results comparing i) different multi-tenant crowdsourcing jobs, including a workload derived from real traces, and ii) different scheduling techniques tested with real crowd workers. Our experimental results show that task scheduling can be leveraged to achieve fairness and reduce query latency in multi-tenant crowd-powered systems, although with very different tradeoffs compared to traditional settings not including human factors.

web search and data mining | 2018

Demographics and Dynamics of Mechanical Turk Workers

Djellel Eddine Difallah; Elena Filatova; Panos Ipeirotis

We present an analysis of the population dynamics and demographics of Amazon Mechanical Turk workers based on the results of the survey that we conducted over a period of 28 months, with more than 85K responses from 40K unique participants. The demographics survey is ongoing (as of November 2017), and the results are available at http://demographics.mturk-tracker.com: we provide an API for researchers to download the survey data. We use techniques from the field of ecology, in particular, the capture-recapture technique, to understand the size and dynamics of the underlying population. We also demonstrate how to model and account for the inherent selection biases in such surveys. Our results indicate that there are more than 100K workers available in Amazon»s crowdsourcing platform, the participation of the workers in the platform follows a heavy-tailed distribution, and at any given time there are more than 2K active workers. We also show that the half-life of a worker on the platform is around 12-18 months and that the rate of arrival of new workers balances the rate of departures, keeping the overall worker population relatively stable. Finally, we demonstrate how we can estimate the biases of different demographics to participate in the survey tasks, and show how to correct such biases. Our methodology is generic and can be applied to any platform where we are interested in understanding the dynamics and demographics of the underlying user population.

international semantic web conference | 2016

VoldemortKG: Mapping schema.org and Web Entities to Linked Open Data

Alberto Tonon; Victor Felder; Djellel Eddine Difallah; Philippe Cudré-Mauroux

Increasingly, webpages mix entities coming from various sources and represented in different ways. It can thus happen that the same entity is both described by using schema.org annotations and by creating a text anchor pointing to its Wikipedia page. Often, those representations provide complementary information which is not exploited since those entities are disjoint. We explored the extent to which entities represented in different ways repeat on the Web, how they are related, and how they complement (or link) to each other. Our initial experiments showed that we can unveil a previously unexploited knowledge graph by applying simple instance matching techniques on a large collection of schema.org annotations and Wikipedia. The resulting knowledge graph aggregates entities (often tail entities) scattered across several webpages, and complements existing Wikipedia entities with new facts and properties. In order to facilitate further investigation in how to mine such information, we are releasing (i) an excerpt of all Common Crawl webpages containing both Wikipedia and schema.org annotations, (ii) the toolset to extract this information and perform knowledge graph construction and mapping onto DBpedia, as well as (iii) the resulting knowledge graph (VoldemortKG) obtained via label matching techniques.

international world wide web conferences | 2014

Hippocampus: answering memory queries using transactive search

Michele Catasta; Alberto Tonon; Djellel Eddine Difallah; Gianluca Demartini; Karl Aberer; Philippe Cudré-Mauroux

Memory queries denote queries where the user is trying to recall from his/her past personal experiences. Neither Web search nor structured queries can effectively answer this type of queries, even when supported by Human Computation solutions. In this paper, we propose a new approach to answer memory queries that we call Transactive Search: The user-requested memory is reconstructed from a group of people by exchanging pieces of personal memories in order to reassemble the overall memory, which is stored in a distributed fashion among members of the group. We experimentally compare our proposed approach against a set of advanced search techniques including the use of Machine Learning methods over the Web of Data, online Social Networks, and Human Computation techniques. Experimental results show that Transactive Search significantly outperforms the effectiveness of existing search approaches for memory queries.

international conference on management of data | 2015

BenchPress: Dynamic Workload Control in the OLTP-Bench Testbed

Dana Van Aken; Djellel Eddine Difallah; Andrew Pavlo; Carlo Curino; Philippe Cudré-Mauroux

Benchmarking is an essential activity when choosing database products, tuning systems, and understanding the trade-offs of the underlying engines. But the workloads available for this effort are often restrictive and non-representative of the ever changing requirements of the modern database applications. We recently introduced OLTP-Bench, an extensible testbed for benchmarking relational databases that is bundled with 15 workloads. The key features that set this framework apart is its ability to tightly control the request rate and dynamically change the transaction mixture. This allows an administrator to compose complex execution targets that recreate real system loads, and opens the doors to new research directions involving tuning for special execution patterns and multi-tenancy. In this demonstration, we highlight OLTP-Benchs important features through the BenchPress game. It allows users to control the benchmark behavior in real time for multiple database management systems.

Explore More