knn_classifier

k-Nearest Neighbors classifier supporting multiple distance metrics, weighting schemes, and both categorical and continuous features. This is a lazy learning algorithm that classifies instances based on the majority class among the k nearest training instances.

The library implements the classifier_protocol defined in the classification_protocols library. It provides predicates for learning a classifier from a dataset, using it to make predictions, and exporting it as a list of predicate clauses or to a file.

Datasets are represented as objects implementing the dataset_protocol protocol from the classification_protocols library. See test_files directory for examples.

API documentation

Open the ../../docs/library_index.html#knn_classifier link in a web browser.

Loading

To load this library, load the loader.lgt file:

| ?- logtalk_load(knn_classifier(loader)).

Testing

To test this library predicates, load the tester.lgt file:

| ?- logtalk_load(knn_classifier(tester)).

Features

  • Multiple Distance Metrics: Euclidean, Manhattan, Chebyshev, Minkowski.

  • Flexible Weighting: Uniform, distance-based, and Gaussian weighting of neighbors.

  • Mixed Features: Automatically handles categorical and continuous features.

  • Configurable Options: k value, distance metric, and weighting scheme via predicate options.

  • Probability Estimation: Provides confidence scores for predictions.

  • Classifier Export: Learned classifiers can be exported as predicate clauses.

Usage

Learning a Classifier

% Learn from a dataset object with default options (k=3, euclidean, uniform)
| ?- knn_classifier::learn(my_dataset, Classifier).
...

% Learn with custom options
| ?- knn_classifier::learn(my_dataset, Classifier, [k(5), distance_metric(manhattan)]).
...

Making Predictions

% Predict class for a new instance
| ?- Instance = [attr1-value1, attr2-value2, ...],
     knn_classifier::learn(my_dataset, Classifier),
     knn_classifier::predict(Classifier, Instance, PredictedClass).
PredictedClass = ...
...

% Predict with custom options
| ?- knn_classifier::predict(Classifier, Instance, PredictedClass, [k(5), weight_scheme(distance)]).
...

% Get probability distribution
| ?- knn_classifier::predict_probabilities(Classifier, Instance, Probabilities).
Probabilities = [class1-0.67, class2-0.33]
...

Exporting the Classifier

Learned classifiers can be exported as a list of clauses or to a file for later use.

% Export as predicate clauses
| ?- knn_classifier::learn(my_dataset, Classifier),
     knn_classifier::export_to_clauses(my_dataset, Classifier, my_classifier, Clauses).
Clauses = [my_classifier(...)]
...

% Export to a file
| ?- knn_classifier::learn(my_dataset, Classifier),
     knn_classifier::export_to_file(my_dataset, Classifier, my_classifier, 'classifier.pl').
...

Using a learned classifier

Learned and saved classifiers can later be used for predictions without needing to access the original training dataset.

% Later, load the file and use the classifier
| ?- consult('classifier.pl'),
     my_classifier(Classifier),
     Instance = [...],
     knn_classifier::predict(Classifier, Instance, Class).
Class = ...
...

Options

The following options can be passed to the predict/4 and predict_probabilities/4 predicates:

  • k(K): Number of neighbors to consider (default: 3)

  • distance_metric(Metric): Distance metric to use. Options: euclidean (default), manhattan, chebyshev, minkowski

  • weight_scheme(Scheme): Weighting scheme for neighbor votes. Options: uniform (default), distance, gaussian

Classifier representation

The learned classifier is represented as a compound term:

knn_classifier(AttributeNames, FeatureTypes, Instances)

Where:

  • AttributeNames: List of attribute names in order

  • FeatureTypes: List of types (numeric or categorical)

  • Instances: List of Values-Class pairs (the training data in compact form)

When exported using export_to_clauses/4 or export_to_file/4, this classifier term is serialized directly as the single argument of the generated predicate clause so that the exported model can be loaded and reused as-is.

References

  1. Cover, T. & Hart, P. (1967). “Nearest neighbor pattern classification”. IEEE Transactions on Information Theory.

  2. Hastie, T., Tibshirani, R., & Friedman, J. (2009). “The Elements of Statistical Learning”. Chapter 13.

  3. Mitchell, T. (1997). “Machine Learning”. Chapter 8: Instance-Based Learning.