Sadia Afroz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sadia Afroz is active.

Explore More

Publication

Featured researches published by Sadia Afroz.

ieee symposium on security and privacy | 2012

Detecting Hoaxes, Frauds, and Deception in Writing Style Online

Sadia Afroz; Michael Brennan; Rachel Greenstadt

In digital forensics, questions often arise about the authors of documents: their identity, demographic background, and whether they can be linked to other documents. The field of stylometry uses linguistic features and machine learning techniques to answer these questions. While stylometry techniques can identify authors with high accuracy in non-adversarial scenarios, their accuracy is reduced to random guessing when faced with authors who intentionally obfuscate their writing style or attempt to imitate that of another author. While these results are good for privacy, they raise concerns about fraud. We argue that some linguistic features change when people hide their writing style and by identifying those features, stylistic deception can be recognized. The major contribution of this work is a method for detecting stylistic deception in written documents. We show that using a large feature set, it is possible to distinguish regular documents from deceptive documents with 96.6% accuracy (F-measure). We also present an analysis of linguistic features that can be modified to hide writing style.

computer and communications security | 2014

A Critical Evaluation of Website Fingerprinting Attacks

Marc Juarez; Sadia Afroz; Gunes Acar; Claudia Diaz; Rachel Greenstadt

Recent studies on Website Fingerprinting (WF) claim to have found highly effective attacks on Tor. However, these studies make assumptions about user settings, adversary capabilities, and the nature of the Web that do not necessarily hold in practical scenarios. The following study critically evaluates these assumptions by conducting the attack where the assumptions do not hold. We show that certain variables, for example, users browsing habits, differences in location and version of Tor Browser Bundle, that are usually omitted from the current WF model have a significant impact on the efficacy of the attack. We also empirically show how prior work succumbs to the base rate fallacy in the open-world scenario. We address this problem by augmenting our classification method with a verification step. We conclude that even though this approach reduces the number of false positives over 63\%, it does not completely solve the problem, which remains an open issue for WF attacks.

privacy enhancing technologies | 2012

Use fewer instances of the letter i: toward writing style anonymization

Andrew W. E. McDonald; Sadia Afroz; Aylin Caliskan; Ariel Stolerman; Rachel Greenstadt

This paper presents Anonymouth, a novel framework for anonymizing writing style. Without accounting for style, anonymous authors risk identification. This framework is necessary to provide a tool for testing the consistency of anonymized writing style and a mechanism for adaptive attacks against stylometry techniques. Our framework defines the steps necessary to anonymize documents and implements them. A key contribution of this work is this framework, including novel methods for identifying which features of documents need to change and how they must be changed to accomplish document anonymization. In our experiment, 80% of the user study participants were able to anonymize their documents in terms of a fixed corpus and limited feature set used. However, modifying pre-written documents were found to be difficult and the anonymization did not hold up to more extensive feature sets. It is important to note that Anonymouth is only the first step toward a tool to acheive stylometric anonymity with respect to state-of-the-art authorship attribution techniques. The topic needs further exploration in order to accomplish significant anonymity.

ieee symposium on security and privacy | 2014

Doppelgänger Finder: Taking Stylometry to the Underground

Sadia Afroz; Aylin Caliskan Islam; Ariel Stolerman; Rachel Greenstadt; Damon McCoy

Stylometry is a method for identifying anonymous authors of anonymous texts by analyzing their writing style. While stylometric methods have produced impressive results in previous experiments, we wanted to explore their performance on a challenging dataset of particular interest to the security research community. Analysis of underground forums can provide key information about who controls a given bot network or sells a service, and the size and scope of the cybercrime underworld. Previous analyses have been accomplished primarily through analysis of limited structured metadata and painstaking manual analysis. However, the key challenge is to automate this process, since this labor intensive manual approach clearly does not scale. We consider two scenarios. The first involves text written by an unknown cybercriminal and a set of potential suspects. This is standard, supervised stylometry problem made more difficult by multilingual forums that mix l33t-speak conversations with data dumps. In the second scenario, you want to feed a forum into an analysis engine and have it output possible doppelgangers, or users with multiple accounts. While other researchers have explored this problem, we propose a method that produces good results on actual separate accounts, as opposed to data sets created by artificially splitting authors into multiple identities. For scenario 1, we achieve 77% to 84% accuracy on private messages. For scenario 2, we achieve 94% recall with 90% precision on blogs and 85.18% precision with 82.14% recall for underground forum users. We demonstrate the utility of our approach with a case study that includes applying our technique to the Carders forum and manual analysis to validate the results, enabling the discovery of previously undetected doppelganger accounts.

Proceedings of the 2013 ACM workshop on Artificial intelligence and security | 2013

Approaches to adversarial drift

Alex Kantchelian; Sadia Afroz; Ling Huang; Aylin Caliskan Islam; Brad Miller; Michael Carl Tschantz; Rachel Greenstadt; Anthony D. Joseph; J. D. Tygar

In this position paper, we argue that to be of practical interest, a machine-learning based security system must engage with the human operators beyond feature engineering and instance labeling to address the challenge of drift in adversarial environments. We propose that designers of such systems broaden the classification goal into an explanatory goal, which would deepen the interaction with systems operators. To provide guidance, we advocate for an approach based on maintaining one classifier for each class of unwanted activity to be filtered. We also emphasize the necessity for the system to be responsive to the operators constant curation of the training set. We show how this paradigm provides a property we call isolation and how it relates to classical causative attacks. In order to demonstrate the effects of drift on a binary classification task, we also report on two experiments using a previously unpublished malware data set where each instance is timestamped according to when it was seen.

Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop | 2014

Adversarial Active Learning

Brad Miller; Alex Kantchelian; Sadia Afroz; Rekha Bachwani; Edwin Dauber; Ling Huang; Michael Carl Tschantz; Anthony D. Joseph; J. D. Tygar

Active learning is an area of machine learning examining strategies for allocation of finite resources, particularly human labeling efforts and to an extent feature extraction, in situations where available data exceeds available resources. In this open problem paper, we motivate the necessity of active learning in the security domain, identify problems caused by the application of present active learning techniques in adversarial settings, and propose a framework for experimentation and implementation of active learning systems in adversarial contexts. More than other contexts, adversarial contexts particularly need active learning as ongoing attempts to evade and confuse classifiers necessitate constant generation of labels for new content to keep pace with adversarial activity. Just as traditional machine learning algorithms are vulnerable to adversarial manipulation, we discuss assumptions specific to active learning that introduce additional vulnerabilities, as well as present vulnerabilities that are amplified in the active learning setting. Lastly, we present a software architecture, Security-oriented Active Learning Testbed (SALT), for the research and implementation of active learning applications in adversarial contexts.

international world wide web conferences | 2017

Tools for Automated Analysis of Cybercriminal Markets

Rebecca S. Portnoff; Sadia Afroz; Greg Durrett; Jonathan K. Kummerfeld; Taylor Berg-Kirkpatrick; Damon McCoy; Kirill Levchenko; Vern Paxson

Underground forums are widely used by criminals to buy and sell a host of stolen items, datasets, resources, and criminal services. These forums contain important resources for understanding cybercrime. However, the number of forums, their size, and the domain expertise required to understand the markets makes manual exploration of these forums unscalable. In this work, we propose an automated, top-down approach for analyzing underground forums. Our approach uses natural language processing and machine learning to automatically generate high-level information about underground forums, first identifying posts related to transactions, and then extracting products and prices. We also demonstrate, via a pair of case studies, how an analyst can use these automated approaches to investigate other categories of products and transactions. We use eight distinct forums to assess our tools: Antichat, Blackhat World, Carders, Darkode, Hack Forums, Hell, L33tCrew and Nulled. Our automated approach is fast and accurate, achieving over 80% accuracy in detecting post category, product, and prices.

ieee symposium on security and privacy | 2016

SoK: Towards Grounding Censorship Circumvention in Empiricism

Michael Carl Tschantz; Sadia Afroz; Vern Paxson

Effective evaluations of approaches to circumventing government Internet censorship require incorporating perspectives of how censors operate in practice. We undertake an extensive examination of real censors by surveying prior measurement studies and analyzing field reports and bug tickets from practitioners. We assess both deployed circumvention approaches and research proposals to consider the criteria employed in their evaluations and compare these to the observed behaviors of real censors, identifying areas where evaluations could more faithfully and effectively incorporate the practices of modern censors. These observations lead to an agenda realigning research with the predominant problems of today.

financial cryptography | 2015

Computer-Supported Cooperative Crime

Vaibhav Garg; Sadia Afroz; Rebekah Overdorf; Rachel Greenstadt

This work addresses fundamental questions about the nature of cybercriminal organization. We investigate the organization of three underground forums: BlackhatWorld, Carders and L33tCrew to understand the nature of distinct communities within a forum, the structure of organization and the impact of enforcement, in particular banning members, on the structure of these forums. We find that each forum is divided into separate competing communities. Smaller communities are limited to 100–230 members, have a two-tiered hierarchy akin to a gang, and focus on a subset of cybercrime activities. Larger communities may have thousands of members and a complex organization with a distributed multi-tiered hierarchy more akin to a mob; such communities also have a more diverse cybercrime portfolio compared to smaller cohorts. Finally, despite differences in size and cybercrime portfolios, members on a single forum have similar operational practices, for example, they use the same electronic currency.

international conference on detection of intrusions and malware and vulnerability assessment | 2016

Reviewer Integration and Performance Measurement for Malware Detection

Brad Miller; Alex Kantchelian; Michael Carl Tschantz; Sadia Afroz; Rekha Bachwani; Riyaz Faizullabhoy; Ling Huang; Vaishaal Shankar; Tony Wu; George Yiu; Anthony D. Joseph; J. D. Tygar

We present and evaluate a large-scale malware detection system integrating machine learning with expert reviewers, treating reviewers as a limited labeling resource. We demonstrate that even in small numbers, reviewers can vastly improve the systems ability to keep pace with evolving threats. We conduct our evaluation on a sample of VirusTotal submissions spanning 2.5i?źyears and containing 1.1 million binaries with 778i?źGB of raw feature data. Without reviewer assistance, we achieve 72i?ź% detection at a 0.5i?ź% false positive rate, performing comparable to the best vendors on VirusTotal. Given a budget of 80 accurate reviews daily, we improve detection to 89i?ź% and are able to detect 42i?ź% of malicious binaries undetected upon initial submission to VirusTotal. Additionally, we identify a previously unnoticed temporal inconsistency in the labeling of training datasets. We compare the impact of training labels obtained at the same time training data is first seen with training labels obtained months later. We find that using training labels obtained well after samples appear, and thus unavailable in practice for current training data, inflates measured detection by almost 20i?ź% points. We release our cluster-based implementation, as well as a list of all hashes in our evaluation and 3i?ź% of our entire dataset.

Explore More