TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON) | 2019

Practical Significance of GA PartCC in Multi-Label Classification

Abstract

Multi-label classification (MLC) can be defined as the objective of learning a classification model which has the capability to infer the accurate labels of new, previously unseen, objects where it is a likely situation that each object of the dataset may rightfully belong to multiple class labels. While single-label classification problems have been thoroughly researched, the same cannot be said for MLC. A gradually increasing number of problems are now being tackled as multi-label, allowing for richer and more accurate knowledge mining in real-world domains, such as medical diagnoses, social media, text classification, etc. Currently, there are two ways of solving MLC problems; Problem Transformation Approach and Algorithm Adaptation Method. Of the two, the former has in its domain Classifier Chains (CC) which is the most effective and popular method of solving MLC problems because of its simplicity in implementation. Unfortunately, CC is not favoured due to 2 drawbacks, [1] ordering of the labels for classification are randomly decided without a fixed logic or algorithm to it which results in varying accuracy, [2] all the labels, even those which may be redundant for a particular dataset are put into the chain despite the probability that some may be carrying irrelevant details. Through the research conducted for the purpose of this study, both challenges are tackled along with others detailed further on simultaneously using Genetic Algorithms (GA) over a Partial CC (PartCC) model, which is a modification over CC. A toxic comments dataset is used since its classification is a multi-label text classification problem with a highly imbalanced dataset. This paper aims to create a prototype model that is capable of detecting various types of toxicity like neutral, toxic, severe toxic, threats, obscenity, insults and identity hate. With the explosion of social media in the modern world and the resulting increasing phenomenon of social media hatred and bullying, there is a need for an advanced prototype model to predict the toxicity of each class of comments.

Volume None

TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON) | 2019

Practical Significance of GA PartCC in Multi-Label Classification

Abstract

Volume None

Pages 2481-2484

DOI 10.1109/TENCON.2019.8929317

Language English

Journal TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON)

Full Text