Dora Erdos | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dora Erdos is active.

Explore More

Publication

Featured researches published by Dora Erdos.

conference on information and knowledge management | 2013

Discovering facts with boolean tensor tucker decomposition

Dora Erdos; Pauli Miettinen

Open Information Extraction (Open IE) has gained increasing research interest in recent years. The first step in Open IE is to extract raw subject--predicate--object triples from the data. These raw triples are rarely usable per se, and need additional post-processing. To that end, we proposed the use of Boolean Tucker tensor decomposition to simultaneously find the entity and relation synonyms and the facts connecting them from the raw triples. Our method represents the synonym sets and facts using (sparse) binary matrices and tensor that can be efficiently stored and manipulated. We consider the presentation of the problem as a Boolean tensor decomposition as one of this papers main contributions. To study the validity of this approach, we use a recent algorithm for scalable Boolean Tucker decomposition. We validate the results with empirical evaluation on a new semi-synthetic data set, generated to faithfully reproduce real-world data features, as well as with real-world data from existing Open IE extractor. We show that our method obtains high precision while the low recall can easily be remedied by considering the original data together with the decomposition.

siam international conference on data mining | 2015

A Divide-and-Conquer Algorithm for Betweenness Centrality

Dora Erdos; Vatche Ishakian; Azer Bestavros; Evimaria Terzi

Given a set of target nodes S in a graph G we dene the betweenness centrality of a node v with respect to S as the fraction of shortest paths among nodes in S that contain v. For this setting we describe Brandes++, a divide-and-conquer algorithm that can eciently compute the exact values of betweenness scores. Brandes++ uses Brandes{ the most widelyused algorithm for betweenness computation { as its subroutine. It achieves the notable faster running times by applying Brandes on signicantly smaller networks than the input graph, and many of its computations can be done in parallel. The degree of speedup achieved by Brandes++ depends on the community structure of the input network as well as the size of S. Our experiments with real-life networks reveal Brandes++ achieves an average of 10-fold speedup over Brandes, while there are networks where this speedup is 75-fold. We have made our code public to benet the research community.

very large data bases | 2012

The filter-placement problem and its application to minimizing information multiplicity

Dora Erdos; Vatche Ishakian; Andrei Lapets; Evimaria Terzi; Azer Bestavros

In many information networks, data items -- such as updates in social networks, news flowing through interconnected RSS feeds and blogs, measurements in sensor networks, route updates in ad-hoc networks -- propagate in an uncoordinated manner: nodes often relay information they receive to neighbors, independent of whether or not these neighbors received the same information from other sources. This uncoordinated data dissemination may result in significant, yet unnecessary communication and processing overheads, ultimately reducing the utility of information networks. To alleviate the negative impacts of this information multiplicity phenomenon, we propose that a subset of nodes (selected at key positions in the network) carry out additional information filtering functionality. Thus, nodes are responsible for the removal (or significant reduction) of the redundant data items relayed through them. We refer to such nodes as filters. We formally define the Filter Placement problem as a combinatorial optimization problem, and study its computational complexity for different types of graphs. We also present polynomial-time approximation algorithms and scalable heuristics for the problem. Our experimental results, which we obtained through extensive simulations on synthetic and real-world information flow networks, suggest that in many settings a relatively small number of filters are fairly effective in removing a large fraction of redundant information.

international conference on data mining | 2013

Walk 'n' Merge: A Scalable Algorithm for Boolean Tensor Factorization

Dora Erdos; Pauli Miettinen

Tensors are becoming increasingly common in data mining, and consequently, tensor factorizations are becoming more important tools for data miners. When the data is binary, it is natural to ask if we can factorize it into binary factors while simultaneously making sure that the reconstructed tensor is still binary. Such factorizations, called Boolean tensor factorizations, can provide improved interpretability and find Boolean structure that is hard to express using normal factorizations. Unfortunately the algorithms for computing Boolean tensor factorizations do not usually scale well. In this paper we present a novel algorithm for finding Boolean CP and Tucker decompositions of large and sparse binary tensors. In our experimental evaluation we show that our algorithm can handle large tensors and accurately reconstructs the latent Boolean structure.

international conference on data mining | 2012

Reconstructing Graphs from Neighborhood Data

Dora Erdos; Rainer Gemulla; Evimaria Terzi

Consider a social network and suppose that we are given the number of common friends between each pair of users. Can we reconstruct the underlying network? Similarly, consider a set of documents and the words that appear in them. If we know the number of common words for every pair of documents, as well as the number of common documents for every pair of words, can we infer which words appear in which documents? In this paper, we develop a general methodology for answering questions like the ones above. We formalize these questions in what we call the Reconstruct problem: Given information about the common neighbors of nodes in a network, our goal is to reconstruct the hidden binary matrix that indicates the presence or absence of relationships between individual nodes. We propose an effective and practical heuristic, which exploits properties of the singular value decomposition of the hidden binary matrix. More specifically, we show that using the available neighborhood information, we can reconstruct the hidden matrix by finding the components of its singular value decomposition and then combining them appropriately. Our extensive experimental study suggests that our methods are able to reconstruct binary matrices of different characteristics with up to 100% accuracy.

SIAM Journal on Discrete Mathematics | 2014

Sink-stable sets of digraphs

Dora Erdos; András Frank; Krisztián Kun

We introduce the notion of sink-stable sets of a digraph and prove a min-max formula for the maximum cardinality of the union of

knowledge discovery and data mining | 2013

Repetition-aware content placement in navigational networks

Dora Erdos; Vatche Ishakian; Azer Bestavros; Evimaria Terzi

siam international conference on data mining | 2011

A framework for the evaluation and management of network centrality

Vatche Ishakian; Dora Erdos; Evimaria Terzi; Azer Bestavros

sink-stable sets. The results imply a recent min-max theorem of Abeledo and Atkinson on the Clar number of bipartite plane graphs and a sharpening of Mintys coloring theorem. We also exhibit a link to min-max results of Bessy and Thomasse and of Sebo on cyclic stable sets.

educational data mining | 2015

Personalized Education; Solving a Group Formation and Scheduling Problem for Educational Content.

Sanaz Bahargam; Dora Erdos; Azer Bestavros; Evimaria Terzi

Arguably, the most effective technique to ensure wide adoption of a concept (or product) is by repeatedly exposing individuals to messages that reinforce the concept (or promote the product). Recognizing the role of repeated exposure to a message, in this paper we propose a novel framework for the effective placement of content: Given the navigational patterns of users in a network, e.g., web graph, hyperlinked corpus, or road network, and given a model of the relationship between content-adoption and frequency of exposition, we define the repetition-aware content-placement (RACP) problem as that of identifying the set of B nodes on which content should be placed so that the expected number of users adopting that content is maximized. The key contribution of our work is the introduction of memory into the navigation process, by making user conversion dependent on the number of her exposures to that content. This dependency is captured using a conversion model that is general enough to capture arbitrary dependencies. Our solution to this general problem builds upon the notion of absorbing random walks, which we extend appropriately in order to address the technicalities of our definitions. Although we show the RACP problem to be NP-hard, we propose a general and efficient algorithmic solution. Our experimental results demonstrate the efficacy and the efficiency of our methods in multiple real-world datasets obtained from different application domains.

siam international conference on data mining | 2017