Imbalanced Data Classification Based on Hybrid Resampling and Twin Support Vector Machine

Lu Cao1, 3 and Hong Shen1, 2

  1. School of data science and computer science
    Sun Yat-sen University, Guangzhou, China
    caolu20001742@163.com, hongsh01@gmail.com
  2. School of Computer Science
    University of Adelaide, Australia
  3. School of Information Engineering
    Wuyi University, Jiangmen, China

Abstract

Imbalanced datasets exist widely in real life. The identification of the minority class in imbalanced datasets tends to be the focus of classification. As a variant of enhanced support vector machine (SVM), the twin support vector machine (TWSVM) provides an effective technique for data classification. TWSVM is based on a relative balance in the training sample dataset and distribution to improve the classification accuracy of the whole dataset, however, it is not effective in dealing with imbalanced data classification problems. In this paper, we propose to combine a re-sampling technique, which utilizes oversampling and under-sampling to balance the training data, with TWSVM to deal with imbalanced data classification. Experimental results show that our proposed approach outperforms other state-of-art methods.

Key words

over-sampling, under-sampling, imbalanced dataset, TWSVM, classification

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS161221017L

Publication information

Volume 14, Issue 3 (September 2017)
Advances in Information Technology, Distributed and Model Driven Systems
Year of Publication: 2017
ISSN: 1820-0214 (Print) 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

DownloadAvailable in PDF
Portable Document Format

How to cite

Cao, L., Shen, H.: Imbalanced Data Classification Based on Hybrid Resampling and Twin Support Vector Machine. Computer Science and Information Systems, Vol. 14, No. 3, 579–595. (2017)