Kamal Taha | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kamal Taha is active.

Explore More

Publication

Featured researches published by Kamal Taha.

IEEE Transactions on Systems, Man, and Cybernetics | 2016

Data Randomization and Cluster-Based Partitioning for Botnet Intrusion Detection

Omar Y. Al-Jarrah; Omar Alhussein; Paul D. Yoo; Sami Muhaidat; Kamal Taha; Kwangjo Kim

Botnets, which consist of remotely controlled compromised machines called bots, provide a distributed platform for several threats against cyber world entities and enterprises. Intrusion detection system (IDS) provides an efficient countermeasure against botnets. It continually monitors and analyzes network traffic for potential vulnerabilities and possible existence of active attacks. A payload-inspection-based IDS (PI-IDS) identifies active intrusion attempts by inspecting transmission control protocol and user datagram protocol packets payload and comparing it with previously seen attacks signatures. However, the PI-IDS abilities to detect intrusions might be incapacitated by packet encryption. Traffic-based IDS (T-IDS) alleviates the shortcomings of PI-IDS, as it does not inspect packet payload; however, it analyzes packet header to identify intrusions. As the networks traffic grows rapidly, not only the detection-rate is critical, but also the efficiency and the scalability of IDS become more significant. In this paper, we propose a state-of-the-art T-IDS built on a novel randomized data partitioned learning model (RDPLM), relying on a compact network feature set and feature selection techniques, simplified subspacing and a multiple randomized meta-learning technique. The proposed model has achieved 99.984% accuracy and 21.38 s training time on a well-known benchmark botnet dataset. Experiment results demonstrate that the proposed methodology outperforms other well-known machine-learning models used in the same detection task, namely, sequential minimal optimization, deep neural network, C4.5, reduced error pruning tree, and randomTree.

IEEE Transactions on Knowledge and Data Engineering | 2010

XCDSearch: An XML Context-Driven Search Engine

Kamal Taha; Ramez Elmasri

We present in this paper, a context-driven search engine called XCDSearch for answering XML Keyword-based queries as well as Loosely Structured queries, using a stack-based sort-merge algorithm. Most current research is focused on building relationships between data elements based solely on their labels and proximity to one another, while overlooking the contexts of the elements, which may lead to erroneous results. Since a data element is generally a characteristic of its parent, its context is determined by its parent. We observe that we could treat each set of elements consisting of a parent and its children data elements as one unified entity, and then use a stack-based sort-merge algorithm employing context-driven search techniques for determining the relationships between the different unified entities. We evaluated XCDSearch experimentally and compared it with five other search engines. The results showed marked improvement.

Knowledge and Information Systems | 2010

BusSEngine: a business search engine

Kamal Taha; Ramez Elmasri

With the emergence of World Wide Web, business’ databases are increasingly being queried directly by customers. The customers may not be aware of the underlying data and its structure, and might have never learned a query language that enables them to issue structured queries. Some of the business’ employees who query the databases may also not be aware of the structure of the data, but they are likely to be aware of some labels of elements containing data. We propose in this article: (1) an XML Keyword-Based search engine for answering business’ customers called BusSEngine-K, and (2) an XML loosely Structured-Based search engine for answering business’ employees called BusSEngine-L. The two engines employ novel context-driven search techniques and are built on top of XQuery search engine. The two engines were evaluated experimentally and compared with three recently proposed XML search engines. The results showed marked improvement.

IEEE Transactions on Nanobioscience | 2013

GRtoGR: A System for Mapping GO Relations to Gene Relations

Kamal Taha

We introduce in this paper a biological search engine called GRtoGR. Given a set S of genes, GRtoGR would determine from GO graph the most significant Lowest Common Ancestor (LCA) of the GO terms annotating the set S. This significant LCA annotates the genes that are the most semantically related to the set S. The framework of GRtoGR refines the concept of LCA by introducing the concepts of Relevant Lowest Common Ancestor (RLCA) and Semantically Relevant Lowest Common Ancestor (SRLCA). A SRLCA is the most significant LCA of the GO terms annotating the set S. We observe that the existence of the GO terms annotating the set S is dependent on the existence of this SRLCA in GO graph. That is, the terms annotating a given set of genes usually have existence dependency relationships with the SRLCA of these terms. We evaluated GRtoGR experimentally and compared it with nine other methods. Results showed marked improvement.

BMC Bioinformatics | 2013

GRank: a middleware search engine for ranking genes by relevance to given genes

Kamal Taha; Dirar Homouz; Hassan Al Muhairi; Zaid Al Mahmoud

BackgroundBiologists may need to know the set of genes that are semantically related to a given set of genes. For instance, a biologist may need to know the set of genes related to another set of genes known to be involved in a specific disease. Some works use the concept of gene clustering in order to identify semantically related genes. Others propose tools that return the set of genes that are semantically related to a given set of genes. Most of these gene similarity measures determine the semantic similarities among the genes based solely on the proximity to each other of the GO terms annotating the genes, while overlook the structural dependencies among these GO terms, which may lead to low recall and precision of results.ResultsWe propose in this paper a search engine called GRank, which overcomes the limitations of the current gene similarity measures outlined above as follows. It employs the concept of existence dependency to determine the structural dependencies among the GO terms annotating a given set of gene. After determining the set of genes that are semantically related to input genes, GRank would use microarray experiment to rank these genes based on their degree of relativity to the input genes. We evaluated GRank experimentally and compared it with a comparable gene prediction tool called DynGO, which retrieves the genes and gene products that are relatives of input genes. Results showed marked improvement.ConclusionsThe experimental results demonstrated that GRank overcomes the limitations of current gene similarity measures. We attribute this performance to GRank’s use of existence dependency concept for determining the semantic relationships among gene annotations. The recall and precision values for two benchmarking datasets showed that GRank outperforms DynGO tool, which does not employ the concept of existence dependency. The demo of GRank using 11000 KEGG yeast genes and a Gene Expression Omnibus (GEO) microarray file named “GSM34635.pad” is available at: http://ecesrvr.kustar.ac.ae:8080/ (click on the link labelled Gene Ontology 2).

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2014

Determining semantically related significant genes

Kamal Taha

GO relation embodies some aspects of existence dependency. If GO term xis existence-dependent on GO term y, the presence of y implies the presence of x. Therefore, the genes annotated with the function of the GO term y are usually functionally and semantically related to the genes annotated with the function of the GO term x. A large number of gene set enrichment analysis methods have been developed in recent years for analyzing gene sets enrichment. However, most of these methods overlook the structural dependencies between GO terms in GO graph by not considering the concept of existence dependency. We propose in this paper a biological search engine called RSGSearch that identifies enriched sets of genes annotated with different functions using the concept of existence dependency. We observe that GO term xcannot be existence-dependent on GO term y, if x- and y- have the same specificity (biological characteristics). After encoding into a numeric format the contributions of GO terms annotating target genes to the semantics of their lowest common ancestors (LCAs), RSGSearch uses microarray experiment to identify the most significant LCA that annotates the result genes. We evaluated RSGSearch experimentally and compared it with five gene set enrichment systems. Results showed marked improvement.

IEEE Transactions on Industrial Informatics | 2015

Simplified Subspaced Regression Network for Identification of Defect Patterns in Semiconductor Wafer Maps

Fatima Adly; Omar Alhussein; Paul D. Yoo; Yousof Al-Hammadi; Kamal Taha; Sami Muhaidat; Young-Seon Jeong; Ui-Hyoung Lee; Mohammed Ismail

Wafer defects, which are primarily defective chips on a wafer, are of the key challenges facing the semiconductor manufacturing companies, as they could increase the yield losses to hundreds of millions of dollars. Fortunately, these wafer defects leave unique patterns due to their spatial dependence across wafer maps. It is thus possible to identify and predict them in order to find the point of failure in the manufacturing process accurately. This paper introduces a novel simplified subspaced regression framework for the accurate and efficient identification of defect patterns in semiconductor wafer maps. It can achieve a test error comparable to or better than the state-of-the-art machine-learning (ML)-based methods, while maintaining a low computational cost when dealing with large-scale wafer data. The effectiveness and utility of the proposed approach has been demonstrated by our experiments on real wafer defect datasets, achieving detection accuracy of 99.884% and R2 of 99.905%, which are far better than those of any existing methods reported in the literature.

british national conference on databases | 2007

OOXsearch: a search engine for answering loosely structured XML queries using OO programming

Kamal Taha; Ramez Elmasri

There has been extensive research in XMLkeyword-based and loosely structured querying. Some frameworks work well for certain types of XML data models and fail in others. The reason is that the proposed techniques are based on finding relationships between solely individual nodes while overlooking the context of these nodes. The context of a leaf node is determined by its parent node, because it specifies one of the characteristics of its parent node. Building relationships between individual leaf nodes without consideration of their parents may result in relationships that are semantically disconnected. Since leaf nodes are nothing but characteristics of their parents, we observe that we could treat each parent-children set of nodes as one unified entity.We then find semantic relationships between the different unified entities. Based on those observations, we propose an XML semantic search engine called OOXSearch, which answers loosely structured queries. The recall and precision of the engine were evaluated experimentally and compared with two recent proposed systems [1, 2] and the results showed marked improvement.

BMC Bioinformatics | 2016

Predicting the functions of a protein from its ability to associate with other molecules

Kamal Taha; Paul D. Yoo

BackgroundAll proteins associate with other molecules. These associated molecules are highly predictive of the potential functions of proteins. The association of a protein and a molecule can be determined from their co-occurrences in biomedical abstracts. Extensive semantically related co-occurrences of a protein’s name and a molecule’s name in the sentences of biomedical abstracts can be considered as indicative of the association between the protein and the molecule. Dependency parsers extract textual relations from a text by determining the grammatical relations between words in a sentence. They can be used for determining the textual relations between proteins and molecules. Despite their success, they may extract textual relations with low precision. This is because they do not consider the semantic relationships between terms in a sentence (i.e., they consider only the structural relationships between the terms). Moreover, they may not be well suited for complex sentences and for long-distance textual relations.ResultsWe introduce an information extraction system called PPFBM that predicts the functions of unannotated proteins from the molecules that associate with these proteins. PPFBM represents each protein by the other molecules that associate with it in the abstracts referenced in the protein’s entries in reliable biological databases. It automatically extracts each co-occurrence of a protein-molecule pair that represents semantic relationship between the pair. Towards this, we present novel semantic rules that identify the semantic relationship between each co-occurrence of a protein-molecule pair using the syntactic structures of sentences and linguistics theories. PPFBM determines the functions of an un-annotated protein p as follows. First, it determines the set Sr of annotated proteins that is semantically similar to p by matching the molecules representing p and the annotated proteins. Then, it assigns p the functional category FC if the significance of the frequency of occurrences of Sr in abstracts associated with proteins annotated with FC is statistically significantly different than the significance of the frequency of occurrences of Sr in abstracts associated with proteins annotated with all other functional categories. We evaluated the quality of PPFBM by comparing it experimentally with two other systems. Results showed marked improvement.ConclusionsThe experimental results demonstrated that PPFBM outperforms other systems that predict protein function from the textual information found within biomedical abstracts. This is because these system do not consider the semantic relationships between terms in a sentence (i.e., they consider only the structural relationships between the terms). PPFBM’s performance over these system increases steadily as the number of training protein increases. That is, PPFBM’s prediction performance becomes more accurate constantly, as the size of training proteins gets larger. This is because every time a new set of test proteins is added to the current set of training proteins. A demo of PPFBM that annotates each input Yeast protein (SGD (Saccharomyces Genome Database). Available at: http://www.yeastgenome.org/download-data/curation) with the functions of Gene Ontology terms is available at: (see Appendix for more details about the demo)http://ecesrvr.kustar.ac.ae:8080/PPFBM/.

IEEE Journal of Biomedical and Health Informatics | 2015

Extracting Various Classes of Data From Biological Text Using the Concept of Existence Dependency

Kamal Taha

One of the key goals of biological natural language processing (NLP) is the automatic information extraction from biomedical publications. Most current constituency and dependency parsers overlook the semantic relationships between the constituents comprising a sentence and may not be well suited for capturing complex long-distance dependences. We propose in this paper a hybrid constituency-dependency parser for biological NLP information extraction called EDCC. EDCC aims at enhancing the state of the art of biological text mining by applying novel linguistic computational techniques that overcome the limitations of current constituency and dependency parsers outlined earlier, as follows: 1) it determines the semantic relationship between each pair of constituents in a sentence using novel semantic rules; and 2) it applies a semantic relationship extraction model that extracts information from different structural forms of constituents in sentences. EDCC can be used to extract different types of data from biological texts for purposes such as protein function prediction, genetic network construction, and protein-protein interaction detection. We evaluated the quality of EDCC by comparing it experimentally with six systems. Results showed marked improvement.

Explore More