Yonghee Shin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yonghee Shin is active.

Explore More

Publication

Featured researches published by Yonghee Shin.

IEEE Transactions on Software Engineering | 2011

Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities

Yonghee Shin; Andrew Meneely; Laurie Williams; Jason A. Osborne

Security inspection and testing require experts in security who think like an attacker. Security experts need to know code locations on which to focus their testing and inspection efforts. Since vulnerabilities are rare occurrences, locating vulnerable code locations can be a challenging task. We investigated whether software metrics obtained from source code and development history are discriminative and predictive of vulnerable code locations. If so, security experts can use this prediction to prioritize security inspection and testing efforts. The metrics we investigated fall into three categories: complexity, code churn, and developer activity metrics. We performed two empirical case studies on large, widely used open-source projects: the Mozilla Firefox web browser and the Red Hat Enterprise Linux kernel. The results indicate that 24 of the 28 metrics collected are discriminative of vulnerabilities for both projects. The models using all three types of metrics together predicted over 80 percent of the known vulnerable files with less than 25 percent false positives for both projects. Compared to a random selection of files for inspection and testing, these models would have reduced the number of files and the number of lines of code to inspect or test by over 71 and 28 percent, respectively, for both projects.

computer and communications security | 2008

Is complexity really the enemy of software security

Yonghee Shin; Laurie Williams

Software complexity is often hypothesized to be the enemy of software security. We performed statistical analysis on nine code complexity metrics from the JavaScript Engine in the Mozilla application framework to investigate if this hypothesis is true. Our initial results show that the nine complexity measures have weak correlation (ρ=0.30 at best) with security problems for Mozilla JavaScript Engine. The study should be replicated on more products with design and code-level metrics. It may be necessary to create new complexity metrics to embody the type of complexity that leads to security problems.

Empirical Software Engineering | 2013

Can traditional fault prediction models be used for vulnerability prediction

Yonghee Shin; Laurie Williams

Finding security vulnerabilities requires a different mindset than finding general faults in software—thinking like an attacker. Therefore, security engineers looking to prioritize security inspection and testing efforts may be better served by a prediction model that indicates security vulnerabilities rather than faults. At the same time, faults and vulnerabilities have commonalities that may allow development teams to use traditional fault prediction models and metrics for vulnerability prediction. The goal of our study is to determine whether fault prediction models can be used for vulnerability prediction or if specialized vulnerability prediction models should be developed when both models are built with traditional metrics of complexity, code churn, and fault history. We have performed an empirical study on a widely-used, large open source project, the Mozilla Firefox web browser, where 21% of the source code files have faults and only 3% of the files have vulnerabilities. Both the fault prediction model and the vulnerability prediction model provide similar ability in vulnerability prediction across a wide range of classification thresholds. For example, the fault prediction model provided recall of 83% and precision of 11% at classification threshold 0.6 and the vulnerability prediction model provided recall of 83% and precision of 12% at classification threshold 0.5. Our results suggest that fault prediction models based upon traditional metrics can substitute for specialized vulnerability prediction models. However, both fault prediction and vulnerability prediction models require significant improvement to reduce false positives while providing high recall.

empirical software engineering and measurement | 2008

An empirical model to predict security vulnerabilities using code complexity metrics

Yonghee Shin; Laurie Williams

Complexity is often hypothesized to be the enemy of software security. If this hypothesis is true, complexity metrics may be used to predict the locale of security problems and can be used to prioritize inspection and testing efforts. We performed statistical analysis on nine complexity metrics from the JavaScript Engine in the Mozilla application framework to find differences in code metrics between vulnerable and nonvulnerable code and to predict vulnerabilities. Our initial results show that complexity metrics can predict vulnerabilities at a low false positive rate, but at a high false negative rate.

international conference on software engineering | 2012

A tactic-centric approach for automating traceability of quality concerns

Mehdi Mirakhorli; Yonghee Shin; Jane Cleland-Huang; Murat Cinar

The software architectures of business, mission, or safety critical systems must be carefully designed to balance an exacting set of quality concerns describing characteristics such as security, reliability, and performance. Unfortunately, software architectures tend to degrade over time as maintainers modify the system without understanding the underlying architectural decisions. Although this problem can be mitigated by manually tracing architectural decisions into the code, the cost and effort required to do this can be prohibitively expensive. In this paper we therefore present a novel approach for automating the construction of traceability links for architectural tactics. Our approach utilizes machine learning methods and lightweight structural analysis to detect tactic-related classes. The detected tactic-related classes are then mapped to a Tactic Traceability Information Model. We train our trace algorithm using code extracted from fifteen performance-centric and safety-critical open source software systems and then evaluate it against the Apache Hadoop framework. Our results show that automatically generated traceability links can support software maintenance activities while helping to preserve architectural qualities.

international conference on software engineering | 2012

TraceLab: an experimental workbench for equipping researchers to innovate, synthesize, and comparatively evaluate traceability solutions

Ed Keenan; Adam Czauderna; Greg Leach; Jane Cleland-Huang; Yonghee Shin; Evan Moritz; Malcom Gethers; Denys Poshyvanyk; Jonathan I. Maletic; Jane Huffman Hayes; Alex Dekhtyar; Daria Manukian; Shervin Hossein; Derek Hearn

TraceLab is designed to empower future traceability research, through facilitating innovation and creativity, increasing collaboration between researchers, decreasing the startup costs and effort of new traceability research projects, and fostering technology transfer. To this end, it provides an experimental environment in which researchers can design and execute experiments in TraceLabs visual modeling environment using a library of reusable and user-defined components. TraceLab fosters research competitions by allowing researchers or industrial sponsors to launch research contests intended to focus attention on compelling traceability challenges. Contests are centered around specific traceability tasks, performed on publicly available datasets, and are evaluated using standard metrics incorporated into reusable TraceLab components. TraceLab has been released in beta-test mode to researchers at seven universities, and will be publicly released via CoEST.org in the summer of 2012. Furthermore, by late 2012 TraceLabs source code will be released as open source software, licensed under GPL. TraceLab currently runs on Windows but is designed with cross platforming issues in mind to allow easy ports to Unix and Mac environments.

acm symposium on applied computing | 2012

A comparative evaluation of two user feedback techniques for requirements trace retrieval

Yonghee Shin; Jane Cleland-Huang

In automated requirements trace retrieval, significant improvements can be realized through incorporating user feedback. In this paper we introduce a relatively new technique named Direct Query Manipulation (DQM) and compare its effectiveness against Rocchio, the current defacto standard for integrating user feedback into automated tracing methods. The two techniques are evaluated empirically through a series of simulations and a user study, conducted by tracing requirements for WorldVista, an electronic healthcare information system against requirements from the Certification Commission for Healthcare Information Technology. Our results show that both Rocchio and DQM return significant improvements in trace quality in comparison to the vector space model, a fully automated technique. DQM performs slightly better than Rocchio in terms of trace quality with minimal difference in human effort. The hybrid approach provides further improvement over both individual approaches of DQM and Rocchio.

international conference on software engineering | 2011

An initial study on the use of execution complexity metrics as indicators of software vulnerabilities

Yonghee Shin; Laurie Williams

Allocating code inspection and testing resources to the most problematic code areas is important to reduce development time and cost. While complexity metrics collected statically from software artifacts are known to be helpful in finding vulnerable code locations, some complex code is rarely executed in practice and has less chance of its vulnerabilities being detected. To augment the use of static complexity metrics, this study examines execution complexity metrics that are collected during code execution as indicators of vulnerable code locations. We conducted case studies on two large size, widely-used open source projects, the Mozilla Firefox web browser and the Wireshark network protocol analyzer. Our results indicate that execution complexity metrics are better indicators of vulnerable code locations than the most commonly-used static complexity metric, lines of source code. The ability of execution complexity metrics to discriminate vulnerable code locations from neutral code locations and to predict vulnerable code locations vary depending on projects. However, the vulnerability prediction models using execution complexity metrics are superior to the models using static complexity metrics in reducing inspection effort.

automated software engineering | 2013

Learning effective query transformations for enhanced requirements trace retrieval

Timothy Dietrich; Jane Cleland-Huang; Yonghee Shin

In automated requirements traceability, significant improvements can be realized through incorporating user feedback into the trace retrieval process. However, existing feedback techniques are designed to improve results for individual queries. In this paper we present a novel technique designed to extend the benefits of user feedback across multiple trace queries. Our approach, named Trace Query Transformation (TQT), utilizes a novel form of Association Rule Mining to learn a set of query transformation rules which are used to improve the efficacy of future trace queries. We evaluate TQT using two different kinds of training sets. The first represents an initial set of queries directly modified by human analysts, while the second represents a set of queries generated by applying a query optimization process based on initial relevance feedback for trace links between a set of source and target documents. Both techniques are evaluated using requirements from theWorldVista Healthcare system, traced against certification requirements for the Commission for Healthcare Information Technology. Results show that the TQT technique returns significant improvements in the quality of generated trace links.

Proceedings of the 6th International Workshop on Traceability in Emerging Forms of Software Engineering | 2011

Traceability challenge 2011: using TraceLab to evaluate the impact of local versus global IDF on trace retrieval

Adam Czauderna; Marek Gibiec; Greg Leach; Yubin Li; Yonghee Shin; Ed Keenan; Jane Cleland-Huang

Numerous trace retrieval algorithms incorporate the standard tf-idf (term frequency, inverse document frequency) technique to weight various terms. In this paper we address Grand Challenge C-GC1 by comparing the effectiveness of computing idf based only on the local terms in the query, versus computing it based on general term usage as documented in the American National Corpus. We also address Grand Challenges L-GC1 and L-GC2 by setting ourselves the additional task of designing and conducting the experiments using the alpha-release of TraceLab. TraceLab is an experimental workbench which allows researchers to graphically model and execute a traceability experiment as a workflow of components. Results of the experiment show that the local idf approach exceeds or matches the global approach in all of the cases studied.

Explore More