Andrew G. West
University of Pennsylvania
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andrew G. West.
international conference on computational linguistics | 2011
B. Thomas Adler; Luca de Alfaro; Santiago Moisés Mola-Velasco; Paolo Rosso; Andrew G. West
Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7% are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia vandalism detection: a spatio-temporal analysis of metadata (STiki), a reputation-based system (WikiTrust), and natural language processing features. The performance of the resulting joint system improves the state-of-the-art from all previous methods and establishes a new baseline for Wikipedia vandalism detection. We examine in detail the contribution of the three approaches, both for the task of discovering fresh vandalism, and for the task of locating vandalism in the complete set of Wikipedia revisions.
european workshop on system security | 2010
Andrew G. West; Sampath Kannan; Insup Lee
Blatantly unproductive edits undermine the quality of the collaboratively-edited encyclopedia, Wikipedia. They not only disseminate dishonest and offensive content, but force editors to waste time undoing such acts of vandalism. Language-processing has been applied to combat these malicious edits, but as with email spam, these filters are evadable and computationally complex. Meanwhile, recent research has shown spatial and temporal features effective in mitigating email spam, while being lightweight and robust. In this paper, we leverage the spatio-temporal properties of revision metadata to detect vandalism on Wikipedia. An administrative form of reversion called rollback enables the tagging of malicious edits, which are contrasted with non-offending edits in numerous dimensions. Crucially, none of these features require inspection of the article or revision text. Ultimately, a classifier is produced which flags vandalism at performance comparable to the natural-language efforts we intend to complement (85% accuracy at 50% recall). The classifier is scalable (processing 100+ edits a second) and has been used to locate over 5,000 manually-confirmed incidents of vandalism outside our labeled set.
ACM Computing Surveys | 2013
Jian Chang; Krishna K. Venkatasubramanian; Andrew G. West; Insup Lee
Web-based malware is a growing threat to todays Internet security. Attacks of this type are prevalent and lead to serious security consequences. Millions of malicious URLs are used as distribution channels to propagate malware all over the Web. After being infected, victim systems fall in the control of attackers, who can utilize them for various cyber crimes such as stealing credentials, spamming, and distributed denial-of-service attacks. Moreover, it has been observed that traditional security technologies such as firewalls and intrusion detection systems have only limited capability to mitigate this new problem. In this article, we survey the state-of-the-art research regarding the analysis of—and defense against—Web-based malware attacks. First, we study the attack model, the root cause, and the vulnerabilities that enable these attacks. Second, we analyze the status quo of the Web-based malware problem. Third, three categories of defense mechanisms are discussed in detail: (1) building honeypots with virtual machines or signature-based detection system to discover existing threats; (2) using code analysis and testing techniques to identify the vulnerabilities of Web applications; and (3) constructing reputation-based blacklists or smart sandbox systems to protect end-users from attacks. We show that these three categories of approaches form an extensive solution space to the Web-based malware problem. Finally, we compare the surveyed approaches and discuss possible future research directions.
annual computer security applications conference | 2010
Andrew G. West; Adam J. Aviv; Jian Chang; Insup Lee
IP blacklists are a spam filtering tool employed by a large number of email providers. Centrally maintained and well regarded, blacklists can filter 80+% of spam without having to perform computationally expensive content-based filtering. However, spammers can vary which hosts send spam (often in intelligent ways), and as a result, some percentage of spamming IPs are not actively listed on any blacklist. Blacklists also provide a previously untapped resource of rich historical information. Leveraging this history in combination with spatial reasoning, this paper presents a novel reputation model (PreSTA), designed to aid in spam classification. In simulation on arriving email at a large university mail system, PreSTA is capable of classifying up to 50% of spam not identified by blacklists alone, and 93% of spam on average (when used in combination with blacklists). Further, the system is consistent in maintaining this blockage-rate even during periods of decreased blacklist performance. PreSTA is scalable and can classify over 500,000 emails an hour. Such a system can be implemented as a complementary blacklist service or used as a first-level filter or prioritization mechanism on an email server.
Future Generation Computer Systems | 2012
Andrew G. West; Jian Chang; Krishna K. Venkatasubramanian; Insup Lee
Collaborative functionality is increasingly prevalent in web applications. Such functionality permits individuals to add-and sometimes modify-web content, often with minimal barriers-to-entry. Ideally, large bodies of knowledge can be amassed and shared in this manner. However, such software also provide a medium for nefarious persons to operate. By determining the extent to which participating content/agents can be trusted, one can identify useful contributions. In this work, we define the notion of trust for collaborative web applications and survey the state-of-the-art for calculating, interpreting, and presenting trust values. Though techniques can be applied broadly, Wikipedias archetypal nature makes it a focal point for discussion.
european workshop on system security | 2009
Andrew G. West; Adam J. Aviv; Jian Chang; Vinayak S Prabhu; Matt Blaze; Sampath Kannan; Insup Lee; Jonathan M. Smith; Oleg Sokolsky
Quantitative Trust Management (QTM) provides a dynamic interpretation of authorization policies for access control decisions based on upon evolving reputations of the entities involved. QuanTM, a QTM system, selectively combines elements from trust management and reputation management to create a novel method for policy evaluation. Trust management, while effective in managing access with delegated credentials (as in PolicyMaker and KeyNote), needs greater flexibility in handling situations of partial trust. Reputation management provides a means to quantify trust, but lacks delegation and policy enforcement. This paper reports on QuanTMs design decisions and novel policy evaluation procedure. A representation of quantified trust relationships, the trust dependency graph, and a sample QuanTM application specific to the KeyNote trust management language, are also proposed.
international symposium on wikis and open collaboration | 2010
Andrew G. West; Sampath Kannan; Insup Lee
STiki is an anti-vandalism tool for Wikipedia. Unlike similar tools, STiki does not rely on natural language processing (NLP) over the article or diff text to locate vandalism. Instead, STiki leverages spatio-temporal properties of revision metadata. The feasibility of utilizing such properties was demonstrated in our prior work, which found they perform comparably to NLP-efforts while being more efficient, robust to evasion, and language independent. STiki is a real-time, on-Wikipedia implementation based on these properties. It consists of, (1) a server-side processing engine that examines revisions, scoring the likelihood each is vandalism, and, (2) a client-side GUI that presents likely vandalism to end-users for definitive classification (and if necessary, reversion on Wikipedia). Our demonstration will provide an introduction to spatio-temporal properties, demonstrate the STiki software, and discuss alternative research uses for the open-source code.
international symposium on wikis and open collaboration | 2011
Andrew G. West; Avantika Agrawal; Phillip Baker; Brittney Exline; Insup Lee
Collaborative models (e.g., wikis) are an increasingly prevalent Web technology. However, the open-access that defines such systems can also be utilized for nefarious purposes. In particular, this paper examines the use of collaborative functionality to add inappropriate hyperlinks to destinations outside the host environment (i.e., link spam). The collaborative encyclopedia, Wikipedia, is the basis for our analysis. Recent research has exposed vulnerabilities in Wikipedias link spam mitigation, finding that human editors are latent and dwindling in quantity. To this end, we propose and develop an autonomous classifier for link additions. Such a system presents unique challenges. For example, low barriers-to-entry invite a diversity of spam types, not just those with economic motivations. Moreover, issues can arise with how a link is presented (regardless of the destination). In this work, a spam corpus is extracted from over 235,000 link additions to English Wikipedia. From this, 40+ features are codified and analyzed. These indicators are computed using wiki metadata, landing site analysis, and external data sources. The resulting classifier attains 64% recall at 0.5% false-positives (ROC-AUC= 0.97). Such performance could enable egregious link additions to be blocked automatically with low false-positive rates, while prioritizing the remainder for human inspection. Finally, a live Wikipedia implementation of the technique has been developed.
trust and trustworthy computing | 2011
Jian Chang; Krishna K. Venkatasubramanian; Andrew G. West; Sampath Kannan; Boon Thau Loo; Oleg Sokolsky; Insup Lee
The Border Gateway Protocol (BGP) works by frequently exchanging updates that disseminate reachability information about IP prefixes (i.e., IP address blocks) between Autonomous Systems (ASes) on the Internet. The ideal operation of BGP relies on three major behavioral assumptions (BAs): (1) information contained in the update is legal and correct, (2) a route to a prefix is stable, and (3) the route adheres to the valley free routing policy. The current operation of BGP implicitly trusts all ASes to adhere to these assumptions. However, several documented violation of these assumptions attest to the fact that such an assumption of trust is perilous. This paper presents AS-TRUST, a scheme that comprehensively characterizes the trustworthiness of ASes with respect to their adherence of the behavioral assumptions. AS-TRUST quantifies trust using the notion of AS reputation. To compute reputation, AS-TRUST analyzes updates received in the past. It then classifies the resulting observations into multiple types of feedback. The feedback is used by a reputation function that uses Bayesian statistics to compute a probabilistic view of AS trustworthiness. This information can then be used for improving quotidian BGP operation by enabling improved route preference and dampening decision making at the ASes. Our implementation of AS-TRUST scheme using publicly available BGP traces demonstrates: (1) the number of ASes involved in violating the BGP behavioral assumptions is significant, and (2) the proposed reputation mechanism provides multi-fold improvement in the ability of ASes to operate in the presence of BA violations.
conference on email and anti-spam | 2011
Andrew G. West; Jian Chang; Krishna K. Venkatasubramanian; Oleg Sokolsky; Insup Lee
Collaborative functionality is an increasingly prevalent web technology. To encourage participation, these systems usually have low barriers-to-entry and permissive privileges. Unsurprisingly, ill-intentioned users try to leverage these characteristics for nefarious purposes. In this work, a particular abuse is examined -- link spamming -- the addition of promotional or otherwise inappropriate hyperlinks. Our analysis focuses on the wiki model and the collaborative encyclopedia, Wikipedia, in particular. A principal goal of spammers is to maximize exposure, the quantity of people who view a link. Creating and analyzing the first Wikipedia link spam corpus, we find that existing spam strategies perform quite poorly in this regard. The status quo spamming model relies on link persistence to accumulate exposures, a strategy that fails given the diligence of the Wikipedia community. Instead, we propose a model that exploits the latency inherent in human anti-spam enforcement. Statistical estimation suggests our novel model would produce significantly more link exposures than status quo techniques. More critically, the strategy could prove economically viable for perpetrators, incentivizing its exploitation. To this end, we address mitigation strategies.