Syed Shariyar Murtaza

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Syed Shariyar Murtaza is active.

Explore More

Publication

Featured researches published by Syed Shariyar Murtaza.

international symposium on software reliability engineering | 2013

A host-based anomaly detection approach by representing system calls as states of kernel modules

Syed Shariyar Murtaza; Wael Khreich; Abdelwahab Hamou-Lhadj; Mario Couture

Despite over two decades of research, high false alarm rates, large trace sizes and high processing times remain among the key issues in host-based anomaly intrusion detection systems. In an attempt to reduce the false alarm rate and processing time while increasing the detection rate, this paper presents a novel anomaly detection technique based on semantic interactions of system calls. The key concept is to represent system calls as states of kernel modules, analyze the state interactions, and identify anomalies by comparing the probabilities of occurrences of states in normal and anomalous traces. In addition, the proposed technique allows a visual understanding of system behaviour, and hence a more informed decision making. We evaluated this technique on Linux based programs of UNM datasets and a new modern Firefox dataset. We created the Firefox dataset on Linux using contemporary test suites and hacking techniques. The results show that our technique yields fewer false alarms and can handle large traces with smaller (or comparable) processing times compared against the existing techniques for the host based anomaly intrusion detection systems.

Information Sciences | 2012

Using entropy measures for comparison of software traces

Andriy V. Miranskyy; Matt Davison; R.M. Reesor; Syed Shariyar Murtaza

The analysis of execution paths (also known as software traces) collected from a given software product can help in a number of areas including software testing, software maintenance and program comprehension. The lack of a scalable matching algorithm operating on detailed execution paths motivates the search for an alternative solution. This paper proposes the use of word entropies for the classification of software traces. Using a well-studied defective software as an example, we investigate the application of both Shannon and extended entropies (Landsberg-Vedral, Renyi and Tsallis) to the classification of traces related to various software defects. Our study shows that using entropy measures for comparisons gives an efficient and scalable method for comparing traces. The three extended entropies, with parameters chosen to emphasize rare events, all perform similarly and are superior to the Shannon entropy.

Empirical Software Engineering | 2011

Characteristics of multiple-component defects and architectural hotspots: a large system case study

Zude Li; Nazim H. Madhavji; Syed Shariyar Murtaza; Mechelle Gittens; Andriy V. Miranskyy; David Godwin; Enzo Cialini

The architecture of a large software system is widely considered important for such reasons as: providing a common goal to the stakeholders in realising the envisaged system; helping to organise the various development teams; and capturing foundational design decisions early in the development. Studies have shown that defects originating in system architectures can consume twice as much correction effort as that for other defects. Clearly, then, scientific studies on architectural defects are important for their improved treatment and prevention. Previous research has focused on the extent of architectural defects in software systems. For this paper, we were motivated to ask the following two complementary questions in a case study: (i) How do multiple-component defects (MCDs)—which are of architectural importance—differ from other types of defects in terms of (a) complexity and (b) persistence across development phases and releases? and (ii) How do highly MCD-concentrated components (the so called, architectural hotspots) differ from other types of components in terms of their (a) interrelationships and (b) persistence across development phases and releases? Results indicate that MCDs are complex to fix and are persistent across phases and releases. In comparison to a non-MCD, a MCD requires over 20 times more changes to fix it and is 6 to 8 times more likely to cross a phase or a release. These findings have implications for defect detection and correction. Results also show that 20% of the subject system’s components contain over 80% of the MCDs and that these components are 2–3 times more likely to persist across multiple system releases than other components in the system. Such MCD-concentrated components constitute architectural “hotspots” which management can focus upon for preventive maintenance and architectural quality improvement. The findings described are from an empirical study of a large legacy software system of size over 20 million lines of code and age over 17 years.

conference of the centre for advanced studies on collaborative research | 2010

F007: finding rediscovered faults from the field using function-level failed traces of software in the field

Syed Shariyar Murtaza; Mechelle Gittens; Zude Li; Nazim H. Madhavji

Studies show that approximately 50% to 90% of the failures reported from the field are rediscoveries of previous faults. Also, approximately 80% of the failures originate from approximately 20% of the code. Despite this identification of the origin of the failures in system code remains an arduous activity, and consumes substantial resources. Prior fault discovery techniques for field traces either require many pass-fail traces, discover only crashing failures, or identify faulty coarse grain code such as files as the source of the fault. This paper describes a new method (F007) that focuses on identifying finer grain faulty code (faulty functions) from only failed traces of deployed software. F007 extracts patterns of function-calls from a historical collection of only function-level failed traces, and then trains decision trees on the extracted function-call patterns for each known faulty function. A ranked list of faulty functions is then predicted by F007 for a new failure trace based on the probability of fault proneness obtained via decision trees. Our case study on the Siemens suite shows that F007: (a) can identify rediscovered faulty functions (with new or old faults) with 60--86% accuracy, (b) needs to examine approximately 5--10% of the code for the Siemens suite, and (c) can discover the faulty functions in every new failed trace by using a small collection of previous failed traces. Thus, F007 can correctly identify the faulty functions for the majority (80%-90%) of (field) failures with the knowledge of a fault in a small percentage (20%) of functions.

international conference on software maintenance | 2009

Analysis of pervasive multiple-component defects in a large software system

Zude Li; Mechelle Gittens; Syed Shariyar Murtaza; Nazim H. Madhavji; Andriy V. Miranskyy; David Godwin; Enzo Cialini

Certain software defects require corrective changes repeatedly in a few components of the system. One type of such defects spans multiple components of the system, and we call such defects pervasive multiple-component defects (PMCDs). In this paper, we describe an empirical study of six releases of a large legacy software system (of approx. size 20 million physical lines of code) to analyze PMCDs with respect to: (1) the complexity of fixing such defects and (2) the persistence of defect-prone components across phases and releases. The overall hypothesis in this study is that PMCDs inflict a greater negative impact than do other defects on defect-correction efficacy. Our findings show that the average number of changes required for fixing PMCDs is 20–30 times as much as the average for all defects. Also, over 80% of PMCD-contained defect-prone components still remain defect-prone in successive phases or releases. These findings support the overall hypothesis strongly. We compare our results, where possible, to those of other researchers and discuss the implications on maintenance processes and tools.

Journal of Systems and Software | 2016

Mining trends and patterns of software vulnerabilities

Syed Shariyar Murtaza; Wael Khreich; Abdelwahab Hamou-Lhadj; Ayse Basar Bener

We mine software vulnerabilities to facilitate vendors in making decisions about future vulnerabilities in software applications.Results show that no significant difference exists in trend of vulnerabilities.Sequential patterns of vulnerability events follow the first order Markov property.Next vulnerability in an application can be predicted with a recall of approximately 80% and precision of approximately 90%. Zero-day vulnerabilities continue to be a threat as they are unknown to vendors; when attacks occur, vendors have zero days to provide remedies. New techniques for the detection of zero-day vulnerabilities on software systems are being developed but they have their own limitations; e.g., anomaly detection techniques are prone to false alarms. To better protect software systems, it is also important to understand the relationship between vulnerabilities and their patterns over a period of time. The mining of trends and patterns of vulnerabilities is useful because it can help software vendors prepare solutions ahead of time for vulnerabilities that may occur in a software application. In this paper, we investigate the use of historical patterns of vulnerabilities in order to predict future vulnerabilities in software applications. In addition, we examine whether the trends of vulnerabilities in software applications have any significant meaning or not. We use the National Vulnerability Database (NVD) as the main resource of vulnerabilities in software applications. We mine vulnerabilities of the last six years from 2009 to 2014 from NVD. Our results show that sequences of the same vulnerabilities (e.g., buffer errors) may occur 150 times in a software product. Our results also depict that the number of SQL injection vulnerabilities have decreased in the last six years while cryptographic vulnerabilities have seen an important increase. However, we have not found any statistical significance in the trends of the occurrence of vulnerabilities over time. The most interesting finding is that the sequential patterns of vulnerability events follow a first order Markov property; that is, we can predict the next vulnerability by using only the previous vulnerability with a recall of approximately 80% and precision of around 90%.

computational intelligence and security | 2015

A trace abstraction approach for host-based anomaly detection

Syed Shariyar Murtaza; Wael Khreich; Abdelwahab Hamou-Lhadj; Stephane Gagnon

High false alarm rates and execution times are among the key issues in host-based anomaly detection systems. In this paper, we investigate the use of trace abstraction techniques for reducing the execution time of anomaly detectors while keeping the same accuracy. The key idea is to represent system call traces as traces of kernel module interactions and use the resulting abstract traces as input to known anomaly detection techniques, such as STIDE (the Sequence Time-Delay Embedding) and HMM (Hidden Markov Models). We performed experiments on three datasets, namely, the traditional UNM dataset as well as two modern datasets, Firefox and ADFA-LD. The results show that kernel module traces can lead to similar or fewer false alarms and considerably smaller execution times compared to raw system call traces for host-based anomaly detection systems.

Journal of Systems and Software | 2014

An empirical study on the use of mutant traces for diagnosis of faults in deployed systems

Syed Shariyar Murtaza; Abdelwahab Hamou-Lhadj; Nazim H. Madhavji; Mechelle Gittens

Debugging deployed systems is an arduous and time consuming task. It is often difficult to generate traces from deployed systems due to the disturbance and overhead that trace collection may cause on a system in operation. Many organizations also do not keep historical traces of failures. On the other hand earlier techniques focusing on fault diagnosis in deployed systems require a collection of passing-failing traces, in-house reproduction of faults or a historical collection of failed traces. In this paper, we investigate an alternative solution. We investigate how artificial faults, generated using software mutation in test environment, can be used to diagnose actual faults in deployed software systems. The use of traces of artificial faults can provide relief when it is not feasible to collect different kinds of traces from deployed systems. Using artificial and actual faults we also investigate the similarity of function call traces of different faults in functions. To achieve our goal, we use decision trees to build a model of traces generated from mutants and test it on faulty traces generated from actual programs. The application of our approach to various real world programs shows that mutants can indeed be used to diagnose faulty functions in the original code with approximately 60-100% accuracy on reviewing 10% or less of the code; whereas, contemporary techniques using pass-fail traces show poor results in the context of software maintenance. Our results also show that different faults in closely related functions occur with similar function call traces. The use of mutation in fault diagnosis shows promising results but the experiments also show the challenges related to using mutants.

source code analysis and manipulation | 2014

Total ADS: Automated Software Anomaly Detection System

Syed Shariyar Murtaza; Abdelwahab Hamou-Lhadj; Wael Khreich; Mario Couture

When a software system starts behaving abnormally during normal operations, system administrators resort to the use of logs, execution traces, and system scanners (e.g., anti-malwares, intrusion detectors, etc.) to diagnose the cause of the anomaly. However, the unpredictable context in which the system runs and daily emergence of new software threats makes it extremely challenging to diagnose anomalies using current tools. Host-based anomaly detection techniques can facilitate the diagnosis of unknown anomalies but there is no common platform with the implementation of such techniques. In this paper, we propose an automated anomaly detection framework (Total ADS) that automatically trains different anomaly detection techniques on a normal trace stream from a software system, raise anomalous alarms on suspicious behaviour in streams of trace data, and uses visualization to facilitate the analysis of the cause of the anomalies. Total ADS is an extensible Eclipse-based open source framework that employs a common trace format to use different types of traces, a common interface to adapt to a variety of anomaly detection techniques (e.g., HMM, sequence matching, etc.). Our case study on a modern Linux server shows that Total ADS automatically detects attacks on the server, shows anomalous paths in traces, and provides forensic insights.

hawaii international conference on system sciences | 2016

How to Effectively Train IBM Watson: Classroom Experience

Syed Shariyar Murtaza; Parisa Lak; Ayse Basar Bener; Armen Pischdotchian

Watson is a question answering system that uses natural language processing, information retrieval, knowledge interpretation, automated reasoning and machine learning techniques. It can analyze millions of documents and answer most of the questions accurately with varying level of confidence. However, training IBM Watson may be tedious and may not be efficient if certain set of guidelines are not followed. In this paper, we discuss an effective strategy to train IBM Watson question answering system. We experienced this strategy during the classroom teaching of IBM Watson at Ryerson University in Big Data Analytics certification program. We have observed that if documents are well segmented, contain relevant titles and have consistent formatting, then the recall of the answers can be as high as 95%.

Explore More