Baharum Baharudin
Universiti Teknologi Petronas
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Baharum Baharudin.
ieee international conference on digital ecosystems and technologies | 2009
Khairullah Khan; Baharum Baharudin; Aurangzeb Khan; Fazal-e-Malik
Opinion Mining is a process, used for automatic extraction of knowledge from the opinion of others about some particular topic or problem. With the growing availability of online resources on web and popularity of fast and rich resources of opinion sharing such as online review sites and personal blogs, Opinion Mining has become an interesting area of research. World Wide Web is a fastest medium for opinion collection from users. Human perception and user opinion has greater potential for knowledge discovery and decision support. In this paper we have presented a survey which covers techniques and methods that promise to enable us to get opinion oriented information from text. This research effort deals with techniques and challenges related to sentiment analysis and Opinion Mining. We have followed systematic literature review process to conduct this survey. Our focus was mainly on machine learning techniques on the basis of their usage and importance for opinion mining. We have tried to identify most commonly used classification techniques for opinionated documents to assist future research in this area.
frontiers of information technology | 2010
Aurangzeb Khan; Baharum Baharudin; Khairullah Khan
Sentiment analysis is the process of analyzing and classifying the rewires contents about a product, event, and place etc into positive, negative or neutral opinion. In this paper; we propose a sentence level machine learning approach for sentiment classification of online reviews. The proposed method extracts the subjective sentences from the reviews and label each sentence either positive or negative based on its word level feature using naïve Naïve Bayesian (NB) classifier. The labeled sentences create an annotated set of sentences called as BOS (Bag-of-Sentences). We train Support Vector machine (SVM) classifier on the BOS for sentences polarity classification. The contextual information in each sentence structure is taken into consideration to calculate the semantic orientation. The effectiveness of the proposed method is evaluated thought simulation. Results show that our machine learning based proposed method on average achieves accuracy of 81% and 83% with some contextual information. This method improves the sentiment classification polarity on sentence level unlike the word level lexical feature based work, by focus on sentences, this also concentrate on contextual information.
international conference on software engineering and computer systems | 2011
Aurangzeb Khan; Baharum Baharudin; Khairullah Khan
Sentiment analysis is the procedure by which information is extracted from the opinions, appraisals and emotions of people in regards to entities, events and their attributes. In decision making, the opinions of others have a significant effect on customers, ease in making choices regards to online shopping, choosing events, products, entities, etc. When an important decision needs to be made, consumers usually want to know the opinion, sentiment and emotion of others. With rapidly growing online resources such as online discussion groups, forums and blogs, people are commentating via the Internet. As a result, a vast amount of new data in the form of customer reviews, comments and opinions about products, events and entities are being generated more and more. So it is desired to develop an efficient and effective sentiment analysis system for online customer reviews and comments. In this paper, the rule based domain independent sentiment analysis method is proposed. The proposed method classifies subjective and objective sentences from reviews and blog comments. The semantic score of subjective sentences is extracted from SentiWordNet to calculate their polarity as positive, negative or neutral based on the contextual sentence structure. The results show the effectiveness of the proposed method and it outperforms the word level and machine learning methods. The proposed method achieves an accuracy of 97.8% at the feedback level and 86.6% at the sentence level.
international visual informatics conference | 2011
Norsila binti Shamsuddin; Wan Fatimah binti Wan Ahmad; Baharum Baharudin; Mohd Rajuddin; Farahwahida Mohd
The visual of an underwater images sometimes does not representing the real world objects. Light absorption and diffusion somehow degrade the clarity of the images. In order to improve visualization of underwater images, an approach using color correction based on histogram is used. The reason why this approach is used is simply because it improves contrast by redistributing intensity distributions and computes a uniform histogram. Histogram in Photoshop is a graph that shows the digital image balance and also placement of the pixels in an image. A few examples are shown in this paper.
international symposium on information technology | 2010
Aurangzeb Khan; Baharum Baharudin; Khairullah Khan
Feature selection and weighting is of vital concern in text classification process which improves the efficiency and accuracy of text classifier. Vector Space Model is used to represent the documents using “Bag of Word” BOW model with term weighting phenomena. Documents representation through this model has some limitations that are, ignoring term dependencies, structure and ordering of the terms in documents. To overcome this problem, Semantics Base Feature Vector using Part of Speech (POS), is proposed, which is used to extract the concept of terms using WordNet, co-occurring and associated terms. The proposed method is applied on small documents dataset which shows that this method outperforms then term frequency/ inverse document frequency (TF-IDF) with BOW feature selection method for text classification.
international conference on computer and information sciences | 2014
Sallam Osman Fageeri; Rohiza Ahmad; Baharum Baharudin
Mining the frequent itemsets are still one of the data mining research challenges. Frequent itemsets generation produce extremely large numbers of generated itemsets that make the algorithms inefficient. The reason is that the most traditional approaches adopt an iterative strategy to discover the itemsets, thats require very large process. Furthermore, the present mining algorithms cannot perform efficiently due to high and repeatedly database scan. In this paper we introduce a new binary-based Semi-Apriori technique that efficiently discovers the frequent itemsets. Extensive experiments had been carried out using the new technique, compared to the existing Apriori algorithms, a tentative result reveal that our technique outperforms Apriori algorithm in terms of execution time.
Archive | 2014
Abubakr Sirageldin; Baharum Baharudin; Low Tang Jung
Due to the rapid growth of the internet, websites have become the intruder’s main target. An intruder embeds malicious contents in a web page for the purpose of doing some bad and unwanted-activities such as: credential information and resource theft, luring a user to visit a dangerous website, downloading and installing software to join a botnet or to participate in distributed denial of service, and even damage the visitor system. As the number of web pages increases, the malicious web pages are also increasing and the attack is increasingly become sophisticated. In this paper, we provide a framework for detecting a malicious web page using artificial neural network learning techniques. In addition to the significant detection rate, our objective is to find also which discriminative features characterize the attack and reduce the false positive rate. The algorithm is based on two features group, the URL lexical and the page content features. The experiments has shown the expected results and the high false positive rate which produced by machine learning approaches is reduced.
artificial intelligence and symbolic computation | 2008
Teguh Bharata Adji; Baharum Baharudin; Norshuhani Zamin
In this paper, we present a Machine Translation (MT) system from English to Indonesian by applying Link Grammar (LG) formalism. The Annotated Disjunct (ADJ) technique available in the LG formalism is utilized to map English sentences into equivalent Indonesian sentences. The ADJ is a promising technique to deal with target languages that do not have grammar formalism, parser, and corpus available like Indonesian language. An experimental evaluation shows that the applicability of LG for Indonesian language worked as expected. We have also discussed some significant issues to be considered in future development.
international conference on intelligent and advanced systems | 2012
Khairullah Khan; Baharum Baharudin
Collecting consumer opinion about products through web is becoming more popular day by day. The opinion of users is helpful to consumers, retailors, and manufacturers in decision making. Due to the huge number user reviews it is impossible to summarize it. Therefore systems are required for mining consumer reviews data efficiently. Opinion mining is an interesting area of research due to its applications in various fields. One of the challenging issues in this area is the identification of opinion components from unstructured reviews. The work of opinion mining is natural language dependent. Therefore syntactic patterns play a key role in identifying the opinion components. In this paper we have presented analysis of synaptic patterns for products features identification from unstructured reviews. Basically the noun phrases are used for named entity identification; however all noun phrases are not features. The problem is how to restrict the patterns to get the features. After in-depth analysis and evaluation we identify a new pattern which shown comparatively best result.
international conference on innovation management and technology research | 2012
Fazal Malik; Baharum Baharudin
The efficient feature extraction and effective similar image retrieval are important steps for effective content-based image retrieval (CBIR) system. The extraction of features in compressed domain is attractive area due to the representation of almost all images in compressed format at present using DCT (Discrete Cosine Transformation) blocks transformation. During compression some critical information is lost and the perceptual information is left only, which has significant energy for retrieval in the compressed domain. In this paper, the statistical color features are extracted from the quantized histograms in the DCT domain using only the DC and the first three AC coefficients of the DCT blocks of image having more significant information. We study the effect of filters in image retrieval using the color features. We perform the experimental comparison of results in terms of precision of the median, median with edge extraction and the Laplacian filters using the color quantized histogram features in the DCT domain. The experimental results of the proposed approach using the Corel image database show that the Laplacian filter with the sharpened images give good performance in retrieval of the JPEG format images as compared to the median filter in the DCT frequency domain.