Distance based Clustering of Class Association Rules to Build a Compact, Accurate and Descriptive Classifier

Jamolbek Mattiev1, 3 and Branko Kavšek1, 2

  1. University of Primorska
    Glagoljaška 8, 6000 Koper, Slovenia
    jamolbek.mattiev@famnit.upr.si, branko.kavsek@upr.si
  2. Jožef Stefan Institute
    Jamova cesta 39, 1000 Ljubljana, Slovenia
    Branko.kavsek@ijs.si
  3. Urgench State University
    Khamid Alimjan str 14, 220100 Urgench, Uzbekistan
    jamolbek_1992@mail.ru

Abstract

Huge amounts of data are being collected and analyzed nowadays. By using the popular rule-learning algorithms, the number of rule discovered on those “big” datasets can easily exceed thousands. To produce compact, understandable and accurate classifiers, such rules have to be grouped and pruned, so that only a reasonable number of them are presented to the end user for inspection and further analysis. In this paper, we propose new methods that are able to reduce the number of class association rules produced by “classical” class association rule classifiers, while maintaining an accurate classification model that is comparable to the ones generated by state-of-the-art classification algorithms. More precisely, we propose new associative classifiers, called DC, DDC and CDC, that use distance-based agglomerative hierarchical clustering as a post-processing step to reduce the number of its rules, and in the rule-selection step, we use different strategies (based on database coverage and cluster center) for each algorithm. Experimental results performed on selected datasets from the UCI ML repository show that our classifiers are able to learn classifiers containing significantly fewer rules than state-of-the-art rule learning algorithms on datasets with a larger number of examples. On the other hand, the classification accuracy of the proposed classifiers is not significantly different from state-of-the-art rule-learners on most of the datasets.

Key words

Frequent Itemset, Class Association Rules (CAR), Associative Classification, Agglomerative Clustering

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS200430037M

Publication information

Volume 18, Issue 3 (June 2021)
Year of Publication: 2021
ISSN: 1820-0214 (Print) 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

DownloadAvailable in PDF
Portable Document Format

How to cite

Mattiev, J., Kavšek, B.: Distance based Clustering of Class Association Rules to Build a Compact, Accurate and Descriptive Classifier. Computer Science and Information Systems, Vol. 18, No. 3, 791–811. (2021), https://doi.org/10.2298/CSIS200430037M