Active Semi-supervised Framework with Data Editing

Xue Zhang1, 2 and Wangxin Xiao3, 4

  1. Beijing Key Laboratory of Information Service Engineering, Beijing Union University
    Beijing 100101, China
  2. Key Laboratory of High Confidence Software Technologies, Ministry of Education, Peking University
    Beijing 100871, China
    jane_zhang@pku.edu.cn
  3. School of Traffic and Transportation Engineering, Changsha University of Science and Technology
    Changsha 410114, China
  4. Key Laboratory for Road Structure & Material of the Ministry of Transport
    Beijing 100088, China
    wx.xiao@rioh.cn

Abstract

In order to address the insufficient training data problem, many active semi-supervised algorithms have been proposed. The self-labeled training data in semi-supervised learning may contain much noise due to the insufficient training data. Such noise may snowball themselves in the following learning process and thus hurt the generalization ability of the final hypothesis. Extremely few labeled training data in sparsely labeled text classification aggravate such situation. If such noise could be identified and removed by some strategy, the performance of the active semi-supervised algorithms should be improved. However, such useful techniques of identifying and removing noise have been seldom explored in existing active semi-supervised algorithms. In this paper, we propose an active semi-supervised framework with data editing (we call it ASSDE) to improve sparsely labeled text classification. A data editing technique is used to identify and remove noise introduced by semi-supervised labeling. We carry out the data editing technique by fully utilizing the advantage of active learning, which is novel according to our knowledge. The fusion of active learning with data editing makes ASSDE more robust to the sparsity and the distribution bias of the training data. It further simplifies the design of semi-supervised learning which makes ASSDE more efficient. Extensive experimental study on several real-world text data sets shows the encouraging results of the proposed framework for sparsely labeled text classification, compared with several state-of-the-art methods.

Key words

sparsely labeled text classification; active learning; semi-supervised learning; data editing

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS120202045Z

Publication information

Volume 9, Issue 4 (December 2012)
Special Issue on Recent Advances in Systems and Informatics
Year of Publication: 2012
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

DownloadAvailable in PDF
Portable Document Format

How to cite

Zhang, X., Xiao, W.: Active Semi-supervised Framework with Data Editing. Computer Science and Information Systems, Vol. 9, No. 4, 1513-1532. (2012), https://doi.org/10.2298/CSIS120202045Z