kmedoids_clusterer
k-Medoids clusterer. It uses an iterative medoid-update algorithm with deterministic initialization and deterministic cluster assignments. Supports continuous attributes only.
The library implements the clusterer_protocol defined in the
clustering_protocols library. It provides predicates for learning a
clusterer from a dataset, assigning new instances to clusters, and
exporting the learned clusterer as a list of predicate clauses or to a
file.
Datasets are represented as objects implementing the
clustering_dataset_protocol protocol from the
clustering_protocols library.
API documentation
Open the ../../apis/library_index.html#kmedoids_clusterer link in a web browser.
Loading
To load this library, load the loader.lgt file:
| ?- logtalk_load(kmedoids_clusterer(loader)).
Testing
To test this library predicates, load the tester.lgt file:
| ?- logtalk_load(kmedoids_clusterer(tester)).
To run the performance benchmark suite, load the
tester_performance.lgt file:
| ?- logtalk_load(kmedoids_clusterer(tester_performance)).
Features
Continuous Datasets: Accepts datasets containing only continuous attributes.
Distance Metrics: Supports Euclidean and Manhattan distances.
Deterministic Initialization: Supports
first_kand deterministicspreadinitialization that repeatedly chooses the farthest example from the medoids selected so far.Optional Feature Scaling: Continuous attributes can be standardized using z-score scaling.
Rich Training Diagnostics: Learned clusterers report training example count, convergence status, iteration count, and final medoid shift.
Portable Export: Learned clusterers can be exported as clauses or files and reused later.
Stable Empty-Cluster Handling: Empty clusters keep their previous medoids instead of failing.
Options
The following options can be passed to the learn/3 predicate:
k(K): Number of clusters to learn. Default is2.maximum_iterations(Iterations): Maximum number of medoid-update iterations. Default is100.tolerance(Tolerance): Maximum medoid shift threshold for convergence. Default is1.0e-6.initialization(Initialization): Medoid initialization strategy. Options:spread(default) orfirst_k.distance_metric(Metric): Distance metric to use. Options:euclidean(default) ormanhattan.feature_scaling(FeatureScaling): Whether to standardize continuous attributes before clustering. Options:on(default) oroff.
Diagnostics
The diagnostics/2 predicate returns a list containing:
model(kmedoids_clusterer)medoid_count(Count)training_example_count(Count)convergence(Reason)iterations(Count)final_shift(Shift)options(Options)
Clusterer representation
The learned clusterer is represented as a compound term with the functor chosen by the user when exporting the clusterer and arity 4. For example:
kmedoids_clusterer(Encoders, Medoids, Options, Diagnostics)
Where:
Encoders: List of continuous attribute encoders storing attribute name, mean, and scale.Medoids: List of medoid vectors in cluster-id order.Options: Effective training options used to learn the clusterer.Diagnostics: Training diagnostics metadata returned by thediagnostics/2predicate.
References
Kaufman & Rousseeuw (1987) - “Clustering by Means of Medoids”. In Statistical Data Analysis Based on the L1-Norm and Related Methods.
Kaufman & Rousseeuw (1990) - “Finding Groups in Data: An Introduction to Cluster Analysis”. Wiley.