dbscan_clusterer
DBSCAN clusterer. Uses deterministic density-based clustering based on epsilon neighborhoods and minimum point counts. Supports continuous attributes only.
The library implements the clusterer_protocol defined in the
clustering_protocols library. It provides predicates for learning a
clusterer from a dataset, assigning new instances to clusters, and
exporting the learned clusterer as a list of predicate clauses or to a
file.
Datasets are represented as objects implementing the
clustering_dataset_protocol protocol from the
clustering_protocols library.
API documentation
Open the ../../apis/library_index.html#dbscan_clusterer link in a web browser.
Loading
To load this library, load the loader.lgt file:
| ?- logtalk_load(dbscan_clusterer(loader)).
Testing
To test this library predicates, load the tester.lgt file:
| ?- logtalk_load(dbscan_clusterer(tester)).
To run the performance benchmark suite, load the
tester_performance.lgt file:
| ?- logtalk_load(dbscan_clusterer(tester_performance)).
Features
Density-Based Clustering: Learns density-connected clusters using epsilon neighborhoods and minimum point counts.
Adaptive Neighborhood Indexing: Builds a low-dimensional epsilon-grid index when it is likely to be cheaper and otherwise falls back to a deterministic metric tree. The metric-tree path uses adaptive leaf sizing, balanced tie-aware subtree splits, selectable exact or heuristic pivot scoring, and lower-allocation range-query traversal.
Continuous Datasets: Accepts datasets containing only continuous attributes.
Distance Metrics: Supports
euclideanandmanhattandistances.Optional Feature Scaling: Continuous attributes can be standardized using z-score scaling.
Reachable-Core Prediction: New instances are assigned to the cluster of the nearest reachable core point within the learned epsilon radius; otherwise the atom
noiseis returned.Portable Export: Learned clusterers can be exported as clauses or files and reused later.
Options
The following options can be passed to the learn/3 predicate:
epsilon(Epsilon): Neighborhood radius used to determine density connectivity. Default is1.0.minimum_points(MinimumPoints): Minimum number of points in an epsilon neighborhood for a core point. Default is2.distance_metric(Metric): Distance metric to use. Options:euclidean(default) ormanhattan.feature_scaling(FeatureScaling): Whether to standardize continuous attributes before clustering. Options:on(default) oroff.pivot_scoring(PivotScoring): Metric-tree pivot scoring strategy. Options:heuristic(default, single-pass dispersion scoring with one final sort) orexact(more expensive gap-and-range scoring that sorts each candidate profile).
Clusterer representation
The learned clusterer is represented as a compound term with the functor chosen by the user when exporting the clusterer and arity 4. For example:
dbscan_clusterer(Encoders, Clusters, Noise, Options)
Where:
Encoders: List of continuous attribute encoders storing attribute name, mean, and scale.Clusters: List ofcluster(Id, CorePoints, BorderPoints)terms in cluster-id order.Noise: List of encoded training points classified as noise.Options: Effective training options used to learn the clusterer.
References
Ester, Kriegel, Sander, and Xu (1996) - “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”. KDD, 226-231.