Jaspreet Bhatia
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jaspreet Bhatia.
international conference on software engineering | 2016
Rocky Slavin; Xiaoyin Wang; Mitra Bokaei Hosseini; James Hester; Ram Krishnan; Jaspreet Bhatia; Travis D. Breaux; Jianwei Niu
Mobile applications frequently access sensitive personal informa- tion to meet user or business requirements. Because such informa- tion is sensitive in general, regulators increasingly require mobile- app developers to publish privacy policies that describe what infor- mation is collected. Furthermore, regulators have fined companies when these policies are inconsistent with the actual data practices of mobile apps. To help mobile-app developers check their pri- vacy policies against their apps’ code for consistency, we propose a semi-automated framework that consists of a policy terminology- API method map that links policy phrases to API methods that pro- duce sensitive information, and information flow analysis to detect misalignments. We present an implementation of our framework based on a privacy-policy-phrase ontology and a collection of map- pings from API methods to policy phrases. Our empirical eval- uation on 477 top Android apps discovered 341 potential privacy policy violations.
The Journal of Legal Studies | 2016
Joel R. Reidenberg; Jaspreet Bhatia; Travis D. Breaux; Thomas B. Norton
Website privacy policies often contain ambiguous language that undermines the purpose and value of privacy notices for site users. This paper compares the impact of different regulatory models on the ambiguity of privacy policies in multiple online sectors. First, the paper develops a theory of vague and ambiguous terms. Next, the paper develops a scoring method to compare the relative vagueness of different privacy policies. Then the theory and scoring are applied using natural language processing to rate a set of policies. The ratings are compared against two benchmarks to show whether government-mandated privacy disclosures result in notices that are less ambiguous than those emerging from the market. The methodology and technical tools can provide companies with mechanisms to improve drafting, enable regulators to easily identify poor privacy policies, and empower regulators to more effectively target enforcement actions.
ACM Transactions on Software Engineering and Methodology | 2016
Jaspreet Bhatia; Travis D. Breaux; Florian Schaub
Privacy policies describe high-level goals for corporate data practices; regulators require industries to make available conspicuous, accurate privacy policies to their customers. Consequently, software requirements must conform to those privacy policies. To help stakeholders extract privacy goals from policies, we introduce a semiautomated framework that combines crowdworker annotations, natural language typed dependency parses, and a reusable lexicon to improve goal-extraction coverage, precision, and recall. The framework evaluation consists of a five-policy corpus governing web and mobile information systems, yielding an average precision of 0.73 and recall of 0.83. The results show that no single framework element alone is sufficient to extract goals; however, the overall framework compensates for elemental limitations. Human annotators are highly adaptive at discovering annotations in new texts, but those annotations can be inconsistent and incomplete; dependency parsers lack sophisticated, tacit knowledge, but they can perform exhaustive text search for prospective requirements indicators; and while the lexicon may never completely saturate, the lexicon terms can be reliably used to improve recall. Lexical reuse reduces false negatives by 41%, increasing the average recall to 0.85. Last, crowd workers were able to identify and remove false positives by around 80%, which improves average precision to 0.93.
2013 3rd International Workshop on Requirements Patterns (RePa) | 2013
Jaspreet Bhatia; Richa Sharma; K. K. Biswas; Smita Ghaisas
Natural Language is the general norm for representing requirements in industry. Such representation of requirements cannot be subjected to automated reasoning and is, often, ambiguous and inconsistent. Structuring the natural language requirements can significantly improve reasoning the requirements as well as reusing them in related future projects. We present a novel automated approach to utilize Grammatical Knowledge Patterns for structuring the natural language requirements in the form of Frames.
2016 IEEE 24th International Requirements Engineering Conference Workshops (REW) | 2016
Jaspreet Bhatia; Morgan C. Evans; Sudarshan Wadkar; Travis D. Breaux
Requirements analysts can model regulated data practices to identify and reason about risks of non-compliance. If terminology is inconsistent or ambiguous, however, these models and their conclusions will be unreliable. To study this problem, we investigated an approach to automatically construct an information type ontology by identifying information type hyponymy in privacy policies using Tregex patterns. Tregex is a utility to match regular expressions against constituency parse trees, which are hierarchical expressions of natural language clauses, including noun and verb phrases. We discovered the Tregex patterns by applying content analysis to 15 privacy policies from three domains (shopping, telecommunication and social networks) to identify all instances of information type hyponymy. From this dataset, three semantic and four syntactic categories of hyponymy emerged based on category completeness and word-order. Among these, we identified and empirically evaluated 26 Tregex patterns to automate the extraction of hyponyms from privacy policies. The patterns identify information type hypernym-hyponym pairs with an average precision of 0.83 and recall of 0.52 across our dataset of 15 policies.
ieee international conference on requirements engineering | 2017
Jaspreet Bhatia; Travis D. Breaux
Privacy laws and international privacy standards require that companies collect only the data they have a stated purpose for, called collection limitation. Furthermore, these regimes prescribe that companies will not use data for purposes other than the purposes for which they were collected, called use limitation, except for legal purposes and when the user provides consent. To help companies write better privacy requirements that embody the use limitations and collection limitation principles, we conducted a case study to identify how purpose is expressed among five privacy policies from the shopping domain. Using content analysis, we discovered six exclusive data purpose categories. In addition, we observed natural language patterns to express purpose. Finally, we found that data purpose specificity varies with the specificity of information type descriptions. We believe this taxonomy and the patterns can help policy analysts discover missing or underspecified purposes to better comply with the collection and use limitation principles.
ieee international conference on requirements engineering | 2017
Morgan C. Evans; Jaspreet Bhatia; Sudarshan Wadkar; Travis D. Breaux
Requirements analysts can model regulated data practices to identify and reason about risks of non-compliance. If terminology is inconsistent or ambiguous, however, these models and their conclusions will be unreliable. To study this problem, we investigated an approach to automatically construct an information type ontology by identifying information type hyponymy in privacy policies using Tregex patterns. Tregex is a utility to match regular expressions against constituency parse trees, which are hierarchical expressions of natural language clauses, including noun and verb phrases. We discovered the Tregex patterns by applying content analysis to 30 privacy policies from six domains (shopping, telecommunication, social networks, employment, health, and news.) From this dataset, three semantic and four lexical categories of hyponymy emerged based on category completeness and word-order. Among these, we identified and empirically evaluated 72 Tregex patterns to automate the extraction of hyponyms from privacy policies. The patterns match information type hyponyms with an average precision of 0.72 and recall of 0.74.
ieee international conference on technologies for homeland security | 2017
Daniel M. Best; Jaspreet Bhatia; Elena S. Peterson; Travis D. Breaux
Information security can benefit from real-time cyber threat indicator sharing, in which companies and government agencies share their knowledge of emerging cyberattacks to benefit their sector and society at large. As attacks become increasingly sophisticated by exploiting behavioral dimensions of human computer operators, there is an increased risk to systems that store personal information. In addition, risk increases as individuals blur the boundaries between workplace and home computing (e.g., using workplace computers for personal reasons). This paper describes an architecture to leverage individual perceptions of privacy risk to compute privacy risk scores over cyber threat indicator data. Unlike security risk, which is a risk to a particular system, privacy risk concerns an individuals personal information being accessed and exploited. The architecture integrates tools to extract information entities from textual threat reports expressed in the STIX format and privacy risk estimates computed using factorial vignettes to survey individual risk perceptions. The architecture aims to optimize for scalability and adaptability to achieve real-time risk scoring.
2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems (MOBILESoft) | 2016
Rocky Slavin; Xiaoyin Wang; Mitra Bokaei Hosseini; James Hester; Ram Krishnan; Jaspreet Bhatia; Travis D. Breaux; Jianwei Niu
Many Android apps heavily depend on collecting and sharing sensitive privacy information, such as device ID, location, and postal address, to provide service and value. To protect user privacy, apps are typically required by market places to provide privacy policies informing users about how their private information will be processed. In this paper, we present PVDetector, an automatic tool that analyzes Android apps to detect privacy-policy violations, i.e., inconsistencies between an app’s data collection code and the corresponding description in its privacy policy.
Proceedings of the 3rd International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering | 2014
Richa Sharma; Jaspreet Bhatia; K. K. Biswas
Coordinating conjunctions have been a major source of ambiguity in Natural Language statements and the concern has been a major research focus in English Linguistics. Natural Language is also the most common form of expressing the requirements for an envisioned software system. These requirement documents also suffer from similar concern of coordination ambiguity. Presence of nocuous coordination ambiguity is a major concern for the requirements analysts. In this paper, we explore the applicability of constituency test for identifying coordinating conjunction instances in the requirements documents. We show through our study how identification of nocuous and innocuous coordinating conjunctions can be improved using semantic similarity heuristics and machine learning. Our study indicates that Naïve Bayes classifier outperforms other machine learning algorithms.