Francesca Spezzano | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Francesca Spezzano is active.

Explore More

Publication

Featured researches published by Francesca Spezzano.

knowledge discovery and data mining | 2015

VEWS: A Wikipedia Vandal Early Warning System

Srijan Kumar; Francesca Spezzano; V. S. Subrahmanian

We study the problem of detecting vandals on Wikipedia before any human or known vandalism detection system reports flagging potential vandals so that such users can be presented early to Wikipedia administrators. We leverage multiple classical ML approaches, but develop 3 novel sets of features. Our Wikipedia Vandal Behavior (WVB) approach uses a novel set of user editing patterns as features to classify some users as vandals. Our Wikipedia Transition Probability Matrix (WTPM) approach uses a set of features derived from a transition probability matrix and then reduces it via a neural net auto-encoder to classify some users as vandals. The VEWS approach merges the previous two approaches. Without using any information (e.g. reverts) provided by other users, these algorithms each have over 85% classification accuracy. Moreover, when temporal recency is considered, accuracy goes to almost 90%. We carry out detailed experiments on a new data set we have created consisting of about 33K Wikipedia users (including both a black list and a white list of editors) and containing 770K edits. We describe specific behaviors that distinguish between vandals and non-vandals. We show that VEWS beats ClueBot NG and STiki, the best known algorithms today for vandalism detection. Moreover, VEWS detects far more vandals than ClueBot NG and on average, detects them 2.39 edits before ClueBot NG when both detect the vandal. However, we show that the combination of VEWS and ClueBot NG can give a fully automated vandal early warning system with even higher accuracy.

advances in social networks analysis and mining | 2014

Accurately detecting trolls in slashdot zoo via decluttering

Srijan Kumar; Francesca Spezzano; V. S. Subrahmanian

Online social networks like Slashdot bring valuable information to millions of users - but their accuracy is based on the integrity of their user base. Unfortunately, there are many “trolls” on Slashdot who post misinformation and compromise system integrity. In this paper, we develop a general algorithm called TIA (short for Troll Identification Algorithm) to classify users of an online “signed” social network as malicious (e.g. trolls on Slashdot) or benign (i.e. normal honest users). Though applicable to many signed social networks, TIA has been tested on troll detection on Slashdot Zoo under a wide variety of parameter settings. Its running time is faster than many past algorithms and it is significantly more accurate than existing methods.

Synthesis Lectures on Data Management | 2012

Incomplete Data and Data Dependencies in Relational Databases

Sergio Greco; Cristian Molinaro; Francesca Spezzano

The chase has long been used as a central tool to analyze dependencies and their effect on queries. It has been applied to different relevant problems in database theory such as query optimization, query containment and equivalence, dependency implication, and database schema design. Recent years have seen a renewed interest in the chase as an important tool in several database applications, such as data exchange and integration, query answering in incomplete data, and many others. It is well known that the chase algorithm might be non-terminating and thus, in order for it to find practical applicability, it is crucial to identify cases where its termination is guaranteed. Another important aspect to consider when dealing with the chase is that it can introduce null values into the database, thereby leading to incomplete data. Thus, in several scenarios where the chase is used the problem of dealing with data dependencies and incomplete data arises. This book discusses fundamental issues concerning data dependencies and incomplete data with a particular focus on the chase and its applications in different database areas. We report recent results about the crucial issue of identifying conditions that guarantee the chase termination. Different database applications where the chase is a central tool are discussed with particular attention devoted to query answering in the presence of data dependencies and database schema design. Table of Contents: Introduction / Relational Databases / Incomplete Databases / The Chase Algorithm / Chase Termination / Data Dependencies and Normal Forms / Universal Repairs / Chase and Database Applications

IEEE Transactions on Knowledge and Data Engineering | 2015

Checking Chase Termination: Cyclicity Analysis and Rewriting Techniques

Sergio Greco; Francesca Spezzano; Irina Trubitsyna

The aim of this paper is to present more general criteria and techniques for chase termination. We first present extensions of the well-known stratification criterion and introduce a new criterion, called local stratification, which generalizes both super-weak acyclicity and stratification-based criteria (including the class of constraints which are inductively restricted). Next, the paper presents a rewriting algorithm transforming the original set of constraints Σ into an “equivalent” set Σα and verifying the structural properties for chase termination on Σα. The rewriting of constraints allows us to recognize larger classes of constraints for which chase termination is guaranteed. In particular, we show that if Σ satisfies chase termination conditions T, then the rewritten set Σα satisfies T as well, but the vice versa is not true, that is there are significant classes of constraints for which Σα satisfies T and Σ does not. A more general rewriting algorithm producing as output an equivalent set of dependencies and a Boolean value stating whether a sort of cyclicity has been detected is also proposed. The new rewriting technique and the checking of acyclicity allow us to introduce the class of acyclic constraints, which generalizes local stratification and guarantees that all chase sequences are finite with a length polynomial in the size of the input database.

Communications of The ACM | 2014

Reshaping terrorist networks

Francesca Spezzano; V. S. Subrahmanian; Aaron Mannes

To destabilize terrorist organizations, the <code>STONE</code> algorithms identify a set of operatives whose removal would maximally reduce lethality.

advances in social networks analysis and mining | 2013

STONE: shaping terrorist organizational network efficiency

Francesca Spezzano; V. S. Subrahmanian; Aaron Mannes

This paper focuses primarily on the Person Successor Problem (PSP): when a terrorist is removed from a terrorist network, who is most likely to take his place? We leverage the solution to PSP to predict a new terrorist network after removal of a set of terrorists and to answer the question: which set of k (k > 0) terrorists should be removed in order to minimize the lethality of the terrorist network? We propose a theoretical model to study these questions taking into account the fact that terrorists may have different individual capabilities. We develop an algorithm for PSP in which analysts can specify the conditions an individual needs to satisfy in order to replace another person. We test the correctness of our algorithm on a real-world partial network dataset for two terrorist groups: Al-Qaeda and Lashkar-e-Taiba where we have ground truth about who replaced who, as well as a synthetic dataset where experts estimate who replaced who. Building on the solution to PSP, we develop an algorithm to identify which set of k people to remove from a terrorist network to minimize the organizations efficiency (formalized as an objective function in some different ways).

conference on information and knowledge management | 2016

DePP: A System for Detecting Pages to Protect in Wikipedia

Kelsey Suyehira; Francesca Spezzano

Wikipedia is based on the idea that anyone can make edits to the website in order to create reliable and crowd-sourced content. Yet with the cover of internet anonymity, some users make changes to the website that do not align with Wikipedias intended uses. For this reason, Wikipedia allows for some pages of the website to become protected, where only certain users can make revisions to the page. This allows administrators to protect pages from vandalism, libel, and edit wars. However, with over five million pages on Wikipedia, it is impossible for administrators to monitor all pages and manually enforce page protection. In this paper we consider for the first time the problem of deciding whether a page should be protected or not in a collaborative environment such as Wikipedia. We formulate the problem as a binary classification task and propose a novel set of features to decide which pages to protect based on (i) users page revision behavior and (ii) page categories. We tested our system, called DePP, on a new dataset we built consisting of 13.6K pages (half protected and half unprotected) and 1.9M edits. Experimental results show that DePP reaches 93.24% classification accuracy and significantly improves over baselines.

conference on information and knowledge management | 2016

Bad Actors in Social Media

Francesca Spezzano

Bad actors seriously compromise social media every day by threatening the safety of the users and the integrity of the content. This keynote speech will give an overview of the state of the art social network analysis, data mining, and machine learning techniques to detect bad actors in social media. More specifically, we will describe both general methods that are platform independent or valid for any malicious user, and methods specific to a particular social media (Twitter, Facebook, Slashdot, Wikipedia, Instagram) and/or a given type of bad actor (bots, spammers, trolls, vandals, cyberbullies). We will group these methods into four broad categories, namely (i) active methods, (ii) content-based, (iii) social network-based, and (iv) behavior-based, and show their effectiveness in enforcing cybersafety.

intelligence and security informatics | 2015

SPINN: Suspicion prediction in nuclear networks

Ian A. Andrews; Srijan Kumar; Francesca Spezzano; V. S. Subrahmanian

The best known analyses to date of nuclear proliferation networks are qualitative analyses of networks consisting of just hundreds of nodes and edges. We propose SPINN - a computational framework that performs the following tasks. Starting from existing lists of sanctioned entities, SPINN automatically builds a highly augmented network by scraping connections between individuals, companies, and government organizations from sources like LinkedIN and public company data from Bloomberg. By analyzing this open source information alone, we have built up a network of over 74K nodes and 1.09M edges, containing a smaller whitelist and a blacklist. We develop numerous “features” of nodes in such networks that take both intrinsic node properties and network properties into account, and based on these, we develop methods to classify previously unclassified nodes as suspicious or unsuspicious. On 10-fold cross validation on ground truth data, we obtain a Matthews Correlation Coefficient for our best classifier of just over 0.9. We show that of the 10 most relevant features for distinguishing between suspicious and non-suspicious nodes, the top 8 are network related measures including a novel notion of suspicion rank.

IEEE Transactions on Knowledge and Data Engineering | 2015

An Effective GPU-Based Approach to Probabilistic Query Confidence Computation

Edoardo Serra; Francesca Spezzano

In recent years, probabilistic data management has received a lot of attention due to several applications that deal with uncertain data: RFID systems, sensor networks, data cleaning, scientific and biomedical data management, and approximate schema mappings. Query evaluation is a challenging problem in probabilistic databases, proved to be #P-hard. A general method for query evaluation is based on the lineage of the query and reduces the query evaluation problem to computing the probability of a propositional formula. The main approaches proposed in the literature to approximate probabilistic queries confidence computation are based on Monte Carlo simulation, or formula compilation into decision diagrams (e.g., d-trees). The former executes a polynomial, but with too many, iterations, while the latter is polynomial for easy queries, but may be exponential in the worst case. We designed a new optimized Monte Carlo algorithm that drastically reduces the number of iterations and proposed an efficient parallel version that we implemented on GPU. Thanks to the elevated degree of parallelism provided by the GPU, combined with the linear speedup of our algorithm, we managed to reduce significantly the long running time required by a sequential Monte Carlo algorithm. Experimental results show that our algorithm is so efficient as to be comparable with the formula compilation approach, but with the significant advantage of avoiding exponential behavior.

Explore More