Imbalanced Data Classification Based on Hybrid Resampling and Twin Support Vector Machine

Lu Cao^{1, 3} and Hong Shen^{1, 2}

School of data science and computer science
Sun Yat-sen University, Guangzhou, China
caolu20001742@163.com, hongsh01@gmail.com
School of Computer Science
University of Adelaide, Australia
School of Information Engineering
Wuyi University, Jiangmen, China

Abstract

Imbalanced datasets exist widely in real life. The identification of the minority class in imbalanced datasets tends to be the focus of classification. As a variant of enhanced support vector machine (SVM), the twin support vector machine (TWSVM) provides an effective technique for data classification. TWSVM is based on a relative balance in the training sample dataset and distribution to improve the classification accuracy of the whole dataset, however, it is not effective in dealing with imbalanced data classification problems. In this paper, we propose to combine a re-sampling technique, which utilizes oversampling and under-sampling to balance the training data, with TWSVM to deal with imbalanced data classification. Experimental results show that our proposed approach outperforms other state-of-art methods.

Key words

over-sampling, under-sampling, imbalanced dataset, TWSVM, classification

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS161221017L

Publication information

Volume 20, Issue 1 (January 2023)
Year of Publication: 2023
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

Download Available in PDF
Portable Document Format

How to cite

Cao, L., Shen, H.: Imbalanced Data Classification Based on Hybrid Resampling and Twin Support Vector Machine. Computer Science and Information Systems, Vol. 20, No. 1, 579–595. (2023), https://doi.org/10.2298/CSIS161221017L