Density-Based Clustering with Constraints

Piotr Lasek1 and Jarek Gryz2

  1. University of Rzeszow
    Poland
    lasek@ur.edu.pl
  2. York University
    Canada
    jarek@cse.yorku.ca

Abstract

In this paper we present our ic-NBC and ic-DBSCAN algorithms for data clustering with constraints. The algorithms are based on density-based clustering algorithms NBC and DBSCAN but allow users to incorporate background knowledge into the process of clustering by means of instance constraints. The knowledge about anticipated groups can be applied by specifying the so-called must-link and cannot-link relationships between objects or points. These relationships are then incorporated into the clustering process. In the proposed algorithms this is achieved by properly merging resulting clusters and introducing a new notion of deferred points which are temporarily excluded from clustering and assigned to clusters based on their involvement in cannot-link relationships. To examine the algorithms, we have carried out a number of experiments. We used benchmark data sets and tested the efficiency and quality of the results. We have also measured the efficiency of the algorithms against their original versions. The experiments prove that the introduction of instance constraints improves the quality of both algorithms. The efficiency is only insignificantly reduced and is due to extra computation related to the introduced constraints.

Key words

data mining, data clustering, semi-supervised clustering, clustering with constraints, instance-level constraints

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS123456789X

Publication information

Volume 16, Issue 2 (June 2019)
Year of Publication: 2019
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

DownloadAvailable in PDF
Portable Document Format

How to cite

Lasek, P., Gryz, J.: Density-Based Clustering with Constraints. Computer Science and Information Systems, Vol. 16, No. 2, 469-489. (2019), https://doi.org/10.2298/CSIS123456789X