Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shomir Wilson is active.

Publication


Featured researches published by Shomir Wilson.


conference on computer supported cooperative work | 2013

Tweets are forever: a large-scale quantitative analysis of deleted tweets

Hazim Almuhimedi; Shomir Wilson; Bin Liu; Norman M. Sadeh; Alessandro Acquisti

This paper describes an empirical study of 1.6M deleted tweets collected over a continuous one-week period from a set of 292K Twitter users. We examine several aggregate properties of deleted tweets, including their connections to other tweets (e.g., whether they are replies or retweets), the clients used to produce them, temporal aspects of deletion, and the presence of geotagging information. Some significant differences were discovered between the two collections, namely in the clients used to post them, their conversational aspects, the sentiment vocabulary present in them, and the days of the week they were posted. However, in other dimensions for which analysis was possible, no substantial differences were found. Finally, we discuss some ramifications of this work for understanding Twitter usage and management of ones privacy.


international world wide web conferences | 2016

Crowdsourcing Annotations for Websites' Privacy Policies: Can It Really Work?

Shomir Wilson; Florian Schaub; Rohan Ramanath; Norman M. Sadeh; Fei Liu; Noah A. Smith; Frederick Liu

Website privacy policies are often long and difficult to understand. While research shows that Internet users care about their privacy, they do not have time to understand the policies of every website they visit, and most users hardly ever read privacy policies. Several recent efforts aim to crowdsource the interpretation of privacy policies and use the resulting annotations to build more effective user interfaces that provide users with salient policy summaries. However, very little attention has been devoted to studying the accuracy and scalability of crowdsourced privacy policy annotations, the types of questions crowdworkers can effectively answer, and the ways in which their productivity can be enhanced. Prior research indicates that most Internet users often have great difficulty understanding privacy policies, suggesting limits to the effectiveness of crowdsourcing approaches. In this paper, we assess the viability of crowdsourcing privacy policy annotations. Our results suggest that, if carefully deployed, crowdsourcing can indeed result in the generation of non-trivial annotations and can also help identify elements of ambiguity in policies. We further introduce and evaluate a method to improve the annotation process by predicting and highlighting paragraphs relevant to specific data practices.


ACM Computing Surveys | 2017

Nudges for Privacy and Security: Understanding and Assisting Users’ Choices Online

Alessandro Acquisti; Idris Adjerid; Rebecca Balebako; Laura Brandimarte; Lorrie Faith Cranor; Saranga Komanduri; Pedro Giovanni Leon; Norman M. Sadeh; Florian Schaub; Manya Sleeper; Yang Wang; Shomir Wilson

Advancements in information technology often task users with complex and consequential privacy and security decisions. A growing body of research has investigated individuals’ choices in the presence of privacy and information security tradeoffs, the decision-making hurdles affecting those choices, and ways to mitigate such hurdles. This article provides a multi-disciplinary assessment of the literature pertaining to privacy and security decision making. It focuses on research on assisting individuals’ privacy and security choices with soft paternalistic interventions that nudge users toward more beneficial choices. The article discusses potential benefits of those interventions, highlights their shortcomings, and identifies key ethical, design, and research challenges.


meeting of the association for computational linguistics | 2016

The creation and analysis of a Website privacy policy corpus

Shomir Wilson; Florian Schaub; Aswarth Abhilash Dara; Frederick Liu; Sushain Cherivirala; Pedro Giovanni Leon; Mads Schaarup Andersen; Sebastian Zimmeck; Kanthashree Mysore Sathyendra; N. Cameron Russell; Thomas B. Norton; Eduard H. Hovy; Joel R. Reidenberg; Norman M. Sadeh

Website privacy policies are often ignored by Internet users, because these documents tend to be long and difficult to understand. However, the significance of privacy policies greatly exceeds the attention paid to them: these documents are binding legal agreements between website operators and their users, and their opaqueness is a challenge not only to Internet users but also to policy regulators. One proposed alternative to the status quo is to automate or semi-automate the extraction of salient details from privacy policy text, using a combination of crowdsourcing, natural language processing, and machine learning. However, there has been a relative dearth of datasets appropriate for identifying data practices in privacy policies. To remedy this problem, we introduce a corpus of 115 privacy policies (267K words) with manual annotations for 23K fine-grained data practices. We describe the process of using skilled annotators and a purpose-built annotation tool to produce the data. We provide findings based on a census of the annotations and show results toward automating the annotation procedure. Finally, we describe challenges and opportunities for the research community to use this corpus to advance research in both privacy and language technologies.


Sprachwissenschaft | 2017

PrivOnto: A semantic framework for the analysis of privacy policies

Alessandro Oltramari; Dhivya Piraviperumal; Florian Schaub; Shomir Wilson; Sushain Cherivirala; Thomas B. Norton; N. Cameron Russell; Peter Story; Joel R. Reidenberg; Norman M. Sadeh

Privacy policies are intended to inform users about the collection and use of their data by websites, mobile apps and other services or appliances they interact with. This also includes informing users about any choices they might have regarding such data practices. However, few users read these often long privacy policies; and those who do have difficulty understanding them, because they are written in convoluted and ambiguous language. A promising approach to help overcome this situation revolves around semi-automatically annotating policies, using combinations of crowdsourcing, machine learning and natural language processing. In this article, we introduce PrivOnto, a semantic framework to represent annotated privacy policies. PrivOnto relies on an ontology developed to represent issues identified as critical to users and/or legal experts. PrivOnto has been used to analyze a corpus of over 23,000 annotated data practices, extracted from 115 privacy policies of US-based companies. We introduce a collection of 57 SPARQL queries to extract information from the PrivOnto knowledge base, with the dual objective of (1) answering privacy questions of interest to users and (2) supporting researchers and regulators in the analysis of privacy policies at scale. We present an interactive online tool using PrivOnto to help users explore our corpus of 23,000 annotated data practices. Finally, we outline future research and open challenges in using semantic technologies for privacy policy analysis.


Archive | 2017

A Bridge from the Use-Mention Distinction to Natural Language Processing

Shomir Wilson

Within computer science, the study of the syntax and semantics of metalanguage is well developed for formal languages, and this work is applied prominently in the creation of programming languages and compilers. However, relatively little work has been done in computer science to address metalanguage in natural languages. This lack has been to the detriment of language technologies that could exploit the information expressed in metalanguage to understand users’ utterances. This chapter addresses metalanguage and quotation from the perspective of mentioned language, a closely related phenomenon, and describes its relevance to core and applied work in natural language processing (NLP), a field in computer science concerned with the interaction between computers and natural languages. Examples are given for how state-of-the-art language technologies fail to cope with mentioned language. Finally, to promote progress on the computational study of mentioned language, a rubric is given for identifying the phenomenon in text. This enables human annotators to work methodically on labeling text to train NLP systems, a crucial precursor to further computational work.


meeting of the association for computational linguistics | 2014

Determiner-Established Deixis to Communicative Artifacts in Pedagogical Text

Shomir Wilson; Jon Oberlander

Pedagogical materials frequently contain deixis to communicative artifacts such as textual structures (e.g., sections and lists), discourse entities, and illustrations. By relating such artifacts to the prose, deixis plays an essential role in structuring the flow of information in informative writing. However, existing language technologies have largely overlooked this mechanism. We examine properties of deixis to communicative artifacts using a corpus rich in determiner-established instances of the phenomenon (e.g., “this section”, “these equations”, “those reasons”) from Wikibooks, a collection of learning texts. We use this corpus in combination with WordNet to determine a set of word senses that are characteristic of the phenomenon, showing its diversity and validating intuitions about its qualities. The results motivate further research to extract the connections encoded by such deixis, with the goals of enhancing tools to present pedagogical e-texts to readers and, more broadly, improving language technologies that rely on deictic phenomena.


ubiquitous computing | 2013

Privacy manipulation and acclimation in a location sharing application

Shomir Wilson; Justin Cranshaw; Norman M. Sadeh; Alessandro Acquisti; Lorrie Faith Cranor; Jay Springfield; Sae Young Jeong; Arun Balasubramanian


network and distributed system security symposium | 2017

Automated Analysis of Privacy Requirements for Mobile Apps.

Sebastian Zimmeck; Ziqi Wang; Lieyong Zou; Roger Iyengar; Bin Liu; Florian Schaub; Shomir Wilson; Norman M. Sadeh; Steven Michael Bellovin; Joel R. Reidenberg


Archive | 2012

Automatic Categorization of Privacy Policies: A Pilot Study

Waleed Ammar; Shomir Wilson; Norman M. Sadeh; Noah A. Smith

Collaboration


Dive into the Shomir Wilson's collaboration.

Top Co-Authors

Avatar

Norman M. Sadeh

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sebastian Zimmeck

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Frederick Liu

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bin Liu

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge