Maryam M. Najafabadi
Florida Atlantic University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Maryam M. Najafabadi.
Journal of Big Data | 2015
Maryam M. Najafabadi; Flavio Villanustre; Taghi M. Khoshgoftaar; Naeem Seliya; Randall Wald; Edin Muharemagic
Big Data Analytics and Deep Learning are two high-focus of data science. Big Data has become important as many organizations both public and private have been collecting massive amounts of domain-specific information, which can contain useful information about problems such as national intelligence, cyber security, fraud detection, marketing, and medical informatics. Companies such as Google and Microsoft are analyzing large volumes of data for business analysis and decisions, impacting existing and future technology. Deep Learning algorithms extract high-level, complex abstractions as data representations through a hierarchical learning process. Complex abstractions are learnt at a given level based on relatively simpler abstractions formulated in the preceding level in the hierarchy. A key benefit of Deep Learning is the analysis and learning of massive amounts of unsupervised data, making it a valuable tool for Big Data Analytics where raw data is largely unlabeled and un-categorized. In the present study, we explore how Deep Learning can be utilized for addressing some important problems in Big Data Analytics, including extracting complex patterns from massive volumes of data, semantic indexing, data tagging, fast information retrieval, and simplifying discriminative tasks. We also investigate some aspects of Deep Learning research that need further exploration to incorporate specific challenges introduced by Big Data Analytics, including streaming data, high-dimensional data, scalability of models, and distributed computing. We conclude by presenting insights into relevant future works by posing some questions, including defining data sampling criteria, domain adaptation modeling, defining criteria for obtaining useful data abstractions, improving semantic indexing, semi-supervised learning, and active learning.
bioinformatics and bioengineering | 2014
Maryam M. Najafabadi; Taghi M. Khoshgoftaar; Clifford Kemp; Naeem Seliya; Richard Zuech
The tremendous growth in computer network and Internet usage, combined with the growing number of attacks makes network security a topic of serious concern. One of the most prevalent network attacks that can threaten computers connected to the network is brute force attack. In this work we investigate the use of machine learners for detecting brute force attacks (on the SSH protocol) at the network level. We base our approach on applying machine learning algorithms on a newly generated dataset based upon network flow data collected at the network level. Applying detection at the network level makes the detection approach more scalable. It also provides protection for the hosts who do not have their own protection. The new dataset consists of real-world network data collected from a production network. We use four different classifiers to build brute force attack detection models. The use of different classifiers facilitates a relatively comprehensive study on the effectiveness of machine learners in the detection of brute force attack on the SSH protocol at the network level. Empirical results show that the machine learners were quite successful in detecting the brute force attacks with a high detection rate and low false alarms. We also investigate the effectiveness of using ports as features during the learning process. We provide a detailed analysis of how the models built can change as a result of including or excluding port features.
International Journal of Reliability, Quality and Safety Engineering | 2016
Maryam M. Najafabadi; Taghi M. Khoshgoftaar; Naeem Seliya
Considering the large quantity of the data flowing through the network routers, there is a very high demand to detect malicious and unhealthy network traffic to provide network users with reliable network operation and security of their information. Predictive models should be built to identify whether a network traffic record is healthy or malicious. To build such models, machine learning methods have started to be used for the task of network intrusion detection. Such predictive models must monitor and analyze a large amount of network data in a reasonable amount of time (usually real time). To do so, they cannot always process the whole data and there is a need for data reduction methods, which reduce the amount of data that needs to be processed. Feature selection is one of the data reduction methods that can be used to decrease the process time. It is important to understand which features are most relevant to determining if a network traffic record is malicious and avoid using the whole feature set to make the processing time more efficient. Also it is important that the simple model built from the reduced feature set be as effective as a model which uses all the features. Considering these facts, feature selection is a very important pre-processing step in the detection of network attacks. The goal is to remove irrelevant and redundant features in order to increase the overall effectiveness of an intrusion detection system without negatively affecting the classification performance. Most of the previous feature selection studies in the area of intrusion detection have been applied on the KDD 99 dataset. As KDD 99 is an outdated dataset, in this paper, we compare different feature selection methods on a relatively new dataset, called Kyoto 2006+. There is no comprehensive comparison of different feature selection approaches for this dataset. In the present work, we study four filter-based feature selection methods which are chosen from two categories for the application of network intrusion detection. Three filter-based feature rankers and one filter-based subset evaluation technique are compared together along with the null case which applies no feature selection. We also apply statistical analysis to determine whether performance differences between these feature selection methods are significant or not. We find that among all the feature selection methods, Signal-to-Noise (S2N) gives the best performance results. It also outperforms no feature selection approach in all the experiments.
bioinformatics and bioengineering | 2014
Charles Wheelus; Taghi M. Khoshgoftaar; Richard Zuech; Maryam M. Najafabadi
This paper compares and contrasts the most widely used network security datasets, evaluating their efficacy in providing a benchmark for intrusion and anomaly detection systems. The antiquated nature of some of the most widely used datasets along with their inadequacies is examined and used as a basis for discussion of a new approach to analyzing network traffic data. Live network traffic is collected that consists of real normal traffic and both real and penetration testing attack data. Attack data is then inspected and labeled by means of manual analysis. While network attacks and anomaly features vary widely, they share some commonalities that are examined here. Among these are: self-similarity convergence, periodicity, and repetition. Further, the knowledge inherent in the definition of network boundaries and advertised services can provide crucial context that allows the network analyst to consider self-aware attributes when examining network traffic sessions. To these ends the Session Aggregation for Network Traffic Analysis (SANTA) dataset is proposed. The motivation and the methodology of collection, aggregation and evaluation of the raw data are presented, as well as the conceptualization of the SANTA attributes and advantages provided by this approach.
Journal of Big Data | 2017
Maryam M. Najafabadi; Taghi M. Khoshgoftaar; Flavio Villanustre; John Holt
With the increasing demand for examining and extracting patterns from massive amounts of data, it is critical to be able to train large models to fulfill the needs that recent advances in the machine learning area create. L-BFGS (Limited-memory Broyden Fletcher Goldfarb Shanno) is a numeric optimization method that has been effectively used for parameter estimation to train various machine learning models. As the number of parameters increase, implementing this algorithm on one single machine can be insufficient, due to the limited number of computational resources available. In this paper, we present a parallelized implementation of the L-BFGS algorithm on a distributed system which includes a cluster of commodity computing machines. We use open source HPCC Systems (High-Performance Computing Cluster) platform as the underlying distributed system to implement the L-BFGS algorithm. We initially provide an overview of the HPCC Systems framework and how it allows for the parallel and distributed computations important for Big Data analytics and, subsequently, we explain our implementation of the L-BFGS algorithm on this platform. Our experimental results show that our large-scale implementation of the L-BFGS algorithm can easily scale from training models with millions of parameters to models with billions of parameters by simply increasing the number of commodity computational nodes.
Archive | 2016
Maryam M. Najafabadi; Flavio Villanustre; Taghi M. Khoshgoftaar; Naeem Seliya; Randall Wald; Edin Muharemagc
Big Data Analytics and Deep Learning are two high-focus of data science. Big Data has become important as many organizations both public and private have been collecting massive amounts of domain-specific information, which can contain useful information about problems such as national intelligence, cyber security, fraud detection, marketing, and medical informatics. Companies such as Google and Microsoft are analyzing large volumes of data for business analysis and decisions, impacting existing and future technology. Deep Learning algorithms extract high-level, complex abstractions as data representations through a hierarchical learning process. Complex abstractions are learnt at a given level based on relatively simpler abstractions formulated in the preceding level in the hierarchy. A key benefit of Deep Learning is the analysis and learning of massive amounts of unsupervised data, making it a valuable tool for Big Data Analytics where raw data is largely unlabeled and un-categorized. In the present study, we explore how Deep Learning can be utilized for addressing some important problems in Big Data Analytics, including extracting complex patterns from massive volumes of data, semantic indexing, data tagging, fast information retrieval, and simplifying discriminative tasks. We also investigate some aspects of Deep Learning research that need further exploration to incorporate specific challenges introduced by Big Data Analytics, including streaming data, high-dimensional data, scalability of models, and distributed computing. We conclude by presenting insights into relevant future works by posing some questions, including defining data sampling criteria, domain adaptation modeling, defining criteria for obtaining useful data abstractions, improving semantic indexing, semi-supervised learning, and active learning.
International Journal of Reliability, Quality and Safety Engineering | 2016
Maryam M. Najafabadi; Taghi M. Khoshgoftaar; Amri Napolitano
Due to the great increase in the amount of attacks that occur in computer networks, there is an increasing dependence on network intrusion detection systems which monitor and analyze the network data to detect attacks. In recent years, machine learning methods have been used to build predictive models for network intrusion detection. These methods are able to automatically extract patterns from the network data to build detection models. Defining proper features, which help models to better discriminate between normal and attack data, is a critical task. While network attacks vary widely, they share some commonalities. Many attacks, by their nature, are repetitive and exhibit behaviors different from normal traffic. Among these commonalities are self-similarity between attack packets, periodicity and repetition characteristics seen in the attack traffic. In this paper, we study the common behaviors between two different attack types, called RUDY and DNS Amplification attacks, in order to propose new featu...
information reuse and integration | 2017
Maryam M. Najafabadi; Taghi M. Khoshgoftaar; Chad Calvert; Clifford Kemp
Distributed Denial of Service (DDoS) attacks are a popular and inexpensive form of cyber attacks. Application layer DDoS attacks utilize legitimate application layer requests to overwhelm a web server. These attacks are a major threat to Internet applications and web services. The main goal of these attacks is to make the services unavailable to legitimate users by overwhelming the resources on a web server. They look valid in connection and protocol characteristics, which makes them difficult to detect. In this paper, we propose a detection method for the application layer DDoS attacks, which is based on user behavior anomaly detection. We extract instances of user behaviors requesting resources from HTTP web server logs. We apply the Principle Component Analysis (PCA) subspace anomaly detection method for the detection of anomalous behavior instances. Web server logs from a web server hosting a student resource portal were collected as experimental data. We also generated nine different HTTP DDoS attacks through penetration testing. Our performance results on the collected data show that using PCAsubspace anomaly detection on user behavior data can detect application layer DDoS attacks, even if they are trying to mimic a normal user’s behavior at some level.
the florida ai research society | 2015
Richard Zuech; Taghi M. Khoshgoftaar; Naeem Seliya; Maryam M. Najafabadi; Clifford Kemp
international conference on machine learning and applications | 2015
Maryam M. Najafabadi; Taghi M. Khoshgoftaar; Chad Calvert; Clifford Kemp