M. Zubair Shafiq | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where M. Zubair Shafiq is active.

Explore More

Publication

Featured researches published by M. Zubair Shafiq.

measurement and modeling of computer systems | 2011

Characterizing and modeling internet traffic dynamics of cellular devices

M. Zubair Shafiq; Lusheng Ji; Alex X. Liu; Jia Wang

Understanding Internet traffic dynamics in large cellular networks is important for network design, troubleshooting, performance evaluation, and optimization. In this paper, we present the results from our study, which is based upon a week-long aggregated flow level mobile device traffic data collected from a major cellular operators core network. In this study, we measure and characterize the spatial and temporal dynamics of mobile Internet traffic. We distinguish our study from other related work by conducting the measurement at a larger scale and exploring mobile data traffic patterns along two new dimensions -- device types and applications that generate such traffic patterns. Based on the findings of our measurement analysis, we propose a Zipf-like model to capture the volume distribution of application traffic and a Markov model to capture the volume dynamics of aggregate Internet traffic. We further customize our models for different device types using an unsupervised clustering algorithm to improve prediction accuracy.

IEEE ACM Transactions on Networking | 2013

Large-scale measurement and characterization of cellular machine-to-machine traffic

M. Zubair Shafiq; Lusheng Ji; Alex X. Liu; Jeffrey Pang; Jia Wang

Cellular network-based machine-to-machine (M2M) communication is fast becoming a market-changing force for a wide spectrum of businesses and applications such as telematics, smart metering, point-of-sale terminals, and home security and automation systems. In this paper, we aim to answer the following important question: Does traffic generated by M2M devices impose new requirements and challenges for cellular network design and management? To answer this question, we take a first look at the characteristics of M2M traffic and compare it to traditional smartphone traffic. We have conducted our measurement analysis using a week-long traffic trace collected from a tier-1 cellular network in the US. We characterize M2M traffic from a wide range of perspectives, including temporal dynamics, device mobility, application usage, and network performance. Our experimental results show that M2M traffic exhibits significantly different patterns than smartphone traffic in multiple aspects. For instance, M2M devices have a much larger ratio of uplink-to-downlink traffic volume, their traffic typically exhibits different diurnal patterns, they are more likely to generate synchronized traffic resulting in bursty aggregate traffic volumes, and are less mobile compared to smartphones. On the other hand, we also find that M2M devices are generally competing with smartphones for network resources in co-located geographical regions. These and other findings suggest that better protocol design, more careful spectrum allocation, and modified pricing schemes may be needed to accommodate the rise of M2M devices.

security and artificial intelligence | 2009

Using spatio-temporal information in API calls with machine learning algorithms for malware detection

Faraz Ahmed; Haider Hameed; M. Zubair Shafiq; Muddassar Farooq

Run-time monitoring of program execution behavior is widely used to discriminate between benign and malicious processes running on an end-host. Towards this end, most of the existing run-time intrusion or malware detection techniques utilize information available in Windows Application Programming Interface (API) call arguments or sequences. In comparison, the key novelty of our proposed tool is the use of statistical features which are extracted from both spatial arguments) and temporal (sequences) information available in Windows API calls. We provide this composite feature set as an input to standard machine learning algorithms to raise the final alarm. The results of our experiments show that the concurrent analysis of spatio-temporal features improves the detection accuracy of all classifiers. We also perform the scalability analysis to identify a minimal subset of API categories to be monitored whilst maintaining high detection accuracy.

international conference on computer communications | 2012

Characterizing geospatial dynamics of application usage in a 3G cellular data network

M. Zubair Shafiq; Lusheng Ji; Alex X. Liu; Jeffrey Pang; Jia Wang

Recent studies on cellular network measurement have provided the evidence that significant geospatial correlations, in terms of traffic volume and application access, exist in cellular network usage. Such geospatial correlation patterns provide local optimization opportunities to cellular network operators for handling the explosive growth in the traffic volume observed in recent years. To the best of our knowledge, in this paper, we provide the first fine-grained characterization of the geospatial dynamics of application usage in a 3G cellular data network. Our analysis is based on two simultaneously collected traces from the radio access network (containing location records) and the core network (containing traffic records) of a tier-1 cellular network in the United States. To better understand the application usage in our data, we first cluster cell locations based on their application distributions and then study the geospatial dynamics of application usage across different geographical regions. The results of our measurement study present cellular network operators with fine-grained insights that can be leveraged to tune network parameter settings.

Fuzzy Sets and Systems | 2009

Fuzzy case-based reasoning for facial expression recognition

Aasia Khanum; Muid Mufti; M. Younus Javed; M. Zubair Shafiq

Fuzzy logic (FL) and case-based reasoning (CBR) are two well-known techniques for the implementation of intelligent classification systems. Each technique has its own advantages and drawbacks. FL, for example, provides an intuitive user interface, simplifies the process of knowledge representation, and minimizes the systems computational complexity in terms of time and memory usage. On the other hand, FL has problems in knowledge elicitation which render it difficult to adopt for intelligent system implementation. CBR avoids these problems by making use of past input-output data to decide the system output for the present input. The accuracy of CBR system grows as the number of cases increase. However, more cases can mean added computational complexity in terms of space and time. In this paper we make the proposition that a hybrid system comprising a blend of FL and CBR can lead to a solution where the two approaches cover each others weaknesses and benefit from each others strengths. We support our claim by taking the problem of facial expression recognition from an input image. The facial expression recognition system presented in this paper uses a case base populated with fuzzy rules for recognizing each expression. Experimental results demonstrate that the system inherits the strengths of both methods.

international conference on network protocols | 2012

A semantics aware approach to automated reverse engineering unknown protocols

Yipeng Wang; Xiaochun Yun; M. Zubair Shafiq; Liyan Wang; Alex X. Liu; Zhibin Zhang; Danfeng Yao; Yongzheng Zhang; Li Guo

Extracting the protocol message format specifications of unknown applications from network traces is important for a variety of applications such as application protocol parsing, vulnerability discovery, and system integration. In this paper, we propose ProDecoder, a network trace based protocol message format inference system that exploits the semantics of protocol messages without the executable code of application protocols. ProDecoder is based on the key insight that the n-grams of protocol traces exhibit highly skewed frequency distribution that can be leveraged for accurate protocol message format inference. In ProDecoder, we first discover the latent relationship among n-grams by first grouping protocol messages with the same semantics and then inferring message formats by keyword based clustering and cluster sequence alignment. We implemented and evaluated ProDecoder to infer message format specifications of SMB (a binary protocol) and SMTP (a textual protocol). Our experimental results show that ProDecoder accurately parses and infers SMB protocol with 100% precision and recall. For SMTP, ProDecoder achieves approximately 95% precision and recall.

evolutionary computation machine learning and data mining in bioinformatics | 2009

Guidelines to Select Machine Learning Scheme for Classification of Biomedical Datasets

Ajay Kumar Tanwani; M. Jamal Afridi; M. Zubair Shafiq; Muddassar Farooq

Biomedical datasets pose a unique challenge to machine learning and data mining algorithms for classification because of their high dimensionality, multiple classes, noisy data and missing values. This paper provides a comprehensive evaluation of a set of diverse machine learning schemes on a number of biomedical datasets. To this end, we follow a four step evaluation methodology: (1) pre-processing the datasets to remove any redundancy, (2) classification of the datasets using six different machine learning algorithms; Naive Bayes (probabilistic), multi-layer perceptron (neural network), SMO (support vector machine), IBk (instance based learner), J48 (decision tree) and RIPPER (rule-based induction), (3) bagging and boosting each algorithm, and (4) combining the best version of each of the base classifiers to make a team of classifiers with stacking and voting techniques. Using this methodology, we have performed experiments on 31 different biomedical datasets. To the best of our knowledge, this is the first study in which such a diverse set of machine learning algorithms are evaluated on so many biomedical datasets. The important outcome of our extensive study is a set of promising guidelines which will help researchers in choosing the best classification scheme for a particular nature of biomedical dataset.

international conference on computer communications | 2011

A distributed and privacy preserving algorithm for identifying information hubs in social networks

Muhammad Usman Ilyas; M. Zubair Shafiq; Alex X. Liu; Hayder Radha

This paper addresses the problem of identifying the top-k information hubs in a social network. Identifying top-k information hubs is crucial for many applications such as advertising in social networks where advertisers are interested in identifying hubs to whom free samples can be given. Existing solutions are centralized and require time stamped information about pair-wise user interactions and can only be used by social network owners as only they have access to such data. Existing distributed and privacy preserving algorithms suffer from poor accuracy. In this paper, we propose a new algorithm to identify information hubs that preserves user privacy. The intuition is that highly connected users tend to have more interactions with their neighbors than less connected users. Our method can identify hubs without requiring a central entity to access the complete friendship graph. We achieve this by fully distributing the computation using the Kempe-McSherry algorithm to address user privacy concerns. To the best of our knowledge, the proposed algorithm represents an arguably first attempt that (1) uses friendship graphs (instead of interaction graphs), (2) employs a truly distributed method over friendship graphs, and (3) maintains user privacy by not requiring them to disclose their friend associations and interactions, for identifying information hubs in social networks. We evaluate the effectiveness of our proposed technique using a real-world Facebook data set containing about 3.1 million users and more than 23 million friendship links. The results of our experiments show that our algorithm is 50% more accurate than existing distributed algorithms. Results also show that the proposed algorithm can estimate the rank of the top-k information hubs users more accurately than existing approaches.

international conference on artificial immune systems | 2009

A Sense of `Danger' for Windows Processes

Salman Manzoor; M. Zubair Shafiq; S. Momina Tabish; Muddassar Farooq

The sophistication of modern computer malware demands run-time malware detection strategies which are not only efficient but also robust to obfuscation and evasion attempts. In this paper, we investigate the suitability of recently proposed Dendritic Cell Algorithms (DCA), both classical DCA (cDCA) and deterministic DCA (dDCA), for malware detection at run-time. We have collected API call traces of real malware and benign processes running on Windows operating system. We evaluate the accuracy of cDCA and dDCA for classifying between malware and benign processes using API call sequences. Moreover, we also study the effects of antigen multiplier and time-windows on the detection accuracy of both algorithms.

IEEE Transactions on Mobile Computing | 2015

Geospatial and Temporal Dynamics of Application Usage in Cellular Data Networks

M. Zubair Shafiq; Lusheng Ji; Alex X. Liu; Jeffrey Pang; Jia Wang

Significant geospatial and temporal correlations, in terms of traffic volume and application access, exist in cellular network usage as shown in recent studies on cellular network measurement. Such geospatial and temporal correlation patterns provide local optimization opportunities to cellular network operators for handling the explosive growth in the traffic volume observed in recent years. To the best of our knowledge, in this paper, we provide the first fine-grained joint characterization of the geospatial and temporal dynamics of application usage in a 3G cellular data network. Our analysis is based on two simultaneously collected traces from the radio access network (containing location records) and the core network (containing traffic records) of a tier-1 cellular network in the United States. To better understand the application usage in our data, we first cluster cell locations based on their application distributions and then study the geospatial and temporal dynamics of application usage across different geographical regions. The results of our measurement study present cellular network operators with fine-grained insights that can be leveraged to tune network parameter settings for better network performance and user experience.

Explore More