Rozita Dara | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rozita Dara is active.

Explore More

Publication

Featured researches published by Rozita Dara.

international symposium on neural networks | 2002

Clustering unlabeled data with SOMs improves classification of labeled real-world data

Rozita Dara; Stefan C. Kremer; Deborah A. Stacey

We show the use of a self organizing map to cluster unlabeled data and to infer possible labelings from the clusters. Our inferred labels are presented to a multilayer perceptron along with labeled data, performance is improved over using only the labeled data. Results are presented for a number of popular real-world benchmark problems from domains other than text. This shows one way in which unlabeled data can be used to enhance supervised learning in a general-purpose neural network.

Pattern Analysis and Applications | 2006

On poem recognition

Hamid R. Tizhoosh; Rozita Dara

Literature is one of the most significant forms of human culture. It represents a high level of intellectual activity. Humans write, read, and enjoy poems in different cultures. The purpose of this work is to initiate research in the field of poem recognition, not to understand them by electronic devices, but to provide humans with modern tools to search for and work with lyrics. This study is concerned to illuminate the challenges of poem recognition from different perspectives. In this paper, various measures of poetry are introduced and their effectiveness are investigated in different case studies using fuzzy logic, bayesian approach, and decision trees.

Pattern Recognition | 2009

Data dependency in multiple classifier systems

Rozita Dara; Mohamed S. Kamel; Nayer M. Wanas

In this paper, the data dependency of aggregation modules in multiple classifier system is being investigated. We first propose a new categorization scheme, in which combining methods are grouped into data-independent, implicitly data-dependent and explicitly data-dependent. It is argued that data-dependent approaches present the highest potential for improved performance. In this study, we intend to provide a comprehensive investigation of this argument and explore the impact of data dependency on the performance of multiple classifiers. We evaluate this impact based on two criteria, prediction accuracy and stability. In addition, we examine the effect of class imbalance and uneven data distribution on these two criteria. This paper presents the findings of an extensive set of comparative experiments. Based on the findings, it can be concluded that data-dependent aggregation methods are generally more stable and less sensitive to class imbalance. In addition, data-dependent methods exhibited superior or identical generalization ability for most of the data sets.

conference on software maintenance and reengineering | 2011

Prioritizing Requirements-Based Regression Test Cases: A Goal-Driven Practice

Mazeiar Salehie; Sen Li; Ladan Tahvildari; Rozita Dara; Shimin Li; Mark Moore

Any changes for maintenance or evolution purposes may break existing working features, or may violate the requirements established in the previous software releases. Regression testing is essential to avoid these problems, but it may be ended up with executing many time-consuming test cases. This paper tries to address prioritizing requirements-based regression test cases. To this end, system-level testing is focused on two practical issues in industrial environments: i) addressing multiple goals regarding quality, cost and effort in a project, and ii) using non-code metrics due to the lack of detailed code metrics in some situations. This paper reports a goal-driven practice at Research In Motion (RIM) towards prioritizing requirements-based test cases regarding these issues. Goal-Question-Metric (GQM) is adopted in identifying metrics for prioritization. Two sample goals are discussed to demonstrate the approach: detecting bugs earlier and maintaining testing effort. We use two releases of a prototype Web-based email client to conduct a set of experiments based on the two mentioned goals. Finally, we discuss lessons learned from applying the goal-driven approach and experiments, and we propose few directions for future research.

Computers & Security | 2018

Network intrusion detection system based on recursive feature addition and bigram technique

Tarfa Hamed; Rozita Dara; Stefan C. Kremer

Abstract Network and Internet security is a critical universal issue. The increased rate of cyber terrorism has put national security under risk. In addition, Internet attacks have caused severe damages to different sectors (i.e., individuals, economy, enterprises, organizations and governments). Network Intrusion Detection Systems (NIDS) are one of the solutions against these attacks. However, NIDS always need to improve their performance in terms of increasing the accuracy and decreasing false alarms. Integrating feature selection with intrusion detection has shown to be a successful approach since feature selection can help in selecting the most informative features from the entire set of features. Usually, for the stealthy and low profile attacks (zero – day attacks), there are few neatly concealed packets distributed over a long period of time to mislead firewalls and NIDS. Besides, there are many features extracted from those packets, which may make some machine learning-based feature selection methods to suffer from overfitting especially when the data have large numbers of features and relatively small numbers of examples. In this paper, we are proposing a NIDS based on a feature selection method called Recursive Feature Addition (RFA) and bigram technique. The system has been designed, implemented and tested. We tested the model on the ISCX 2012 data set, which is one of the most well-known and recent data sets for intrusion detection purposes. Furthermore, we are proposing a bigram technique to encode payload string features into a useful representation that can be used in feature selection. In addition, we propose a new evaluation metric called (combined) that combines accuracy, detection rate and false alarm rate in a way that helps in comparing different systems and selecting the best among them. The designed feature selection-based system has shown a noticeable improvement on the performance using different metrics.

Journal of Pattern Recognition Research | 2008

Poetic Features for Poem Recognition: A Comparative Study

Hamid R. Tizhoosh; Farhang Sahba; Rozita Dara

Poetry is a form of art that is used to express emotions and feelings. Humans can easily distinguish poetry without any sophisticated tools. This s tudy is concerned with developing intelligent methods which can be used to distinguish poem from prose. The goal is to distinguish and extract effective poetic features with whi ch poems/lyrics can be accurately classified from other type of texts. In this paper, we propose five different approaches to poem classification. In each approach, we extracted a differ ent set of poetic features and evaluated their performances against each other. In additi on, we empirically assessed the effectiveness of traditional text classification methods f or poem recognition and compared it with the proposed poetic features. While all of these appr oaches performed well, some showed superior results. Findings of this study suggest that the proposed features generate highly accurate classifiers, which can be used for poem minin g in large databases such as World Wide Web.

international c conference on computer science & software engineering | 2016

A Novel Differential Privacy Approach that Enhances Classification Accuracy

A. N. K. Zaman; Charlie Obimbo; Rozita Dara

In the recent past, there has been a tremendous increase of large repositories of data, examples being in healthcare data, consumer data from retailers, and airline passenger data. These data are continually being shared with interested parties, either anonymously -- for research purposes, or openly by financial or insurance companies, for decision-making purposes. When is shared anonymously, there is still the possibility of de-anonymizing the data. Privacy Preserving Data Publishing (PPDP) is a way to allow one to share secure data while ensuring protection against identity disclosure of an individual. Generalization of attributes is a technique of data anonymization where an attribute is replaced with a more generalized value. Differential privacy is a technique that ensures the highest level of privacy for a record owner while providing actual information about the data set. This research develops a framework by generalizing attributes of a data set that satisfy differential privacy principles for publishing secure data for sharing. The proposed algorithm is a non-interactive method to publish anonymize data set, and the decision tree classifier showed better results compared to other existing classification works on anonymized data set. In this paper differential privacy refers to ε-differential privacy.

multiple classifier systems | 2004

Sharing training patterns among multiple classifiers

Rozita Dara; Mohamed S. Kamel

Demand for solving complex problems has directed the research trend in intelligent systems toward design of cooperative multi-experts. One way of achieving effective cooperation is through sharing resources such as information and components. In this paper, we study classifier combination techniques from cooperation perspective. The degree and method by which multiple classifier systems share training resources can be a measure of cooperation. Even though data modification techniques, such as bagging and k-fold crossvalidation, have been extensively used, there is no guidance whether sharing or not sharing training patterns results in higher accuracy and under what conditions. We carried out a set of experiments to examine the effect of sharing training patterns on several architectures by varying the size of overlap between 0-100% of the size of training subsets. The overall conclusion is that sharing training patterns among classifiers is beneficial.

international conference on software maintenance | 2009

Using dynamic execution data to generate test cases

Rozita Dara; Shimin Li; Weining Liu; Angi Smith-Ghorbani; Ladan Tahvildari

The testing activities of the Software Verification and Validation (SV&V) team at Research In Motion (RIM) are requirements-based, which is commonly known as requirements-based testing (RBT). This paper proposes a novel approach to enhance the current RBT process at RIM, by utilizing historical testing data from previous releases, static analysis of the modified source code, and real-time execution data. The main focus is on the test case generation phase and the objective is to increase the effectiveness and efficiency of test cases in such a way that overall testing is improved. The enhanced process not only automatically generates effective test cases but also seeks to achieve high test coverage and low defect escape rate.

international conference on multiple classifier systems | 2005

Data partitioning evaluation measures for classifier ensembles

Rozita Dara; Masoud Makrehchi; Mohamed S. Kamel

Training data modification has shown to be a successful technique for the design of classifier ensemble. Current study is concerned with the analysis of different types of training set distribution and their impact on the generalization capability of multiple classifier systems. To provide a comparative study, several probabilistic measures have been proposed to assess data partitions with different characteristics and distributions. Based on these measures, a large number of disjoint training partitions were generated and used to construct classifier ensembles. Empirical assessment of the resulted ensembles and their performances have provided insights into the selection of appropriate evaluation measures as well as construction of efficient population of partitions.

Explore More