.. _library_knn_classifier:

``knn_classifier``
==================

k-Nearest Neighbors classifier supporting multiple distance metrics,
weighting schemes, and both categorical and continuous features. This is
a lazy learning algorithm that classifies instances based on the
majority class among the k nearest training instances.

The library implements the ``classifier_protocol`` defined in the
``classification_protocols`` library. It provides predicates for
learning a classifier from a dataset, using it to make predictions, and
exporting it as a list of predicate clauses or to a file.

Datasets are represented as objects implementing the
``dataset_protocol`` protocol from the ``classification_protocols``
library. See ``test_files`` directory for examples.

API documentation
-----------------

Open the
`../../docs/library_index.html#knn_classifier <../../docs/library_index.html#knn_classifier>`__
link in a web browser.

Loading
-------

To load this library, load the ``loader.lgt`` file:

::

   | ?- logtalk_load(knn_classifier(loader)).

Testing
-------

To test this library predicates, load the ``tester.lgt`` file:

::

   | ?- logtalk_load(knn_classifier(tester)).

Features
--------

- **Multiple Distance Metrics**: Euclidean, Manhattan, Chebyshev,
  Minkowski.
- **Flexible Weighting**: Uniform, distance-based, and Gaussian
  weighting of neighbors.
- **Mixed Features**: Automatically handles categorical and continuous
  features.
- **Configurable Options**: k value, distance metric, and weighting
  scheme via predicate options.
- **Probability Estimation**: Provides confidence scores for
  predictions.
- **Classifier Export**: Learned classifiers can be exported as
  predicate clauses.

Usage
-----

Learning a Classifier
~~~~~~~~~~~~~~~~~~~~~

::

   % Learn from a dataset object with default options (k=3, euclidean, uniform)
   | ?- knn_classifier::learn(my_dataset, Classifier).
   ...

   % Learn with custom options
   | ?- knn_classifier::learn(my_dataset, Classifier, [k(5), distance_metric(manhattan)]).
   ...

Making Predictions
~~~~~~~~~~~~~~~~~~

::

   % Predict class for a new instance
   | ?- Instance = [attr1-value1, attr2-value2, ...],
        knn_classifier::learn(my_dataset, Classifier),
        knn_classifier::predict(Classifier, Instance, PredictedClass).
   PredictedClass = ...
   ...

   % Predict with custom options
   | ?- knn_classifier::predict(Classifier, Instance, PredictedClass, [k(5), weight_scheme(distance)]).
   ...

   % Get probability distribution
   | ?- knn_classifier::predict_probabilities(Classifier, Instance, Probabilities).
   Probabilities = [class1-0.67, class2-0.33]
   ...

Exporting the Classifier
~~~~~~~~~~~~~~~~~~~~~~~~

Learned classifiers can be exported as a list of clauses or to a file
for later use.

::

   % Export as predicate clauses
   | ?- knn_classifier::learn(my_dataset, Classifier),
        knn_classifier::export_to_clauses(my_dataset, Classifier, my_classifier, Clauses).
   Clauses = [my_classifier(...)]
   ...

   % Export to a file
   | ?- knn_classifier::learn(my_dataset, Classifier),
        knn_classifier::export_to_file(my_dataset, Classifier, my_classifier, 'classifier.pl').
   ...

Using a learned classifier
~~~~~~~~~~~~~~~~~~~~~~~~~~

Learned and saved classifiers can later be used for predictions without
needing to access the original training dataset.

::

   % Later, load the file and use the classifier
   | ?- consult('classifier.pl'),
        my_classifier(Classifier),
        Instance = [...],
        knn_classifier::predict(Classifier, Instance, Class).
   Class = ...
   ...

Options
-------

The following options can be passed to the ``predict/4`` and
``predict_probabilities/4`` predicates:

- ``k(K)``: Number of neighbors to consider (default: 3)
- ``distance_metric(Metric)``: Distance metric to use. Options:
  ``euclidean`` (default), ``manhattan``, ``chebyshev``, ``minkowski``
- ``weight_scheme(Scheme)``: Weighting scheme for neighbor votes.
  Options: ``uniform`` (default), ``distance``, ``gaussian``

Classifier representation
-------------------------

The learned classifier is represented as a compound term:

::

   knn_classifier(AttributeNames, FeatureTypes, Instances)

Where:

- ``AttributeNames``: List of attribute names in order
- ``FeatureTypes``: List of types (``numeric`` or ``categorical``)
- ``Instances``: List of ``Values-Class`` pairs (the training data in
  compact form)

When exported using ``export_to_clauses/4`` or ``export_to_file/4``,
this classifier term is serialized directly as the single argument of
the generated predicate clause so that the exported model can be loaded
and reused as-is.

References
----------

1. Cover, T. & Hart, P. (1967). "Nearest neighbor pattern
   classification". IEEE Transactions on Information Theory.
2. Hastie, T., Tibshirani, R., & Friedman, J. (2009). "The Elements of
   Statistical Learning". Chapter 13.
3. Mitchell, T. (1997). "Machine Learning". Chapter 8: Instance-Based
   Learning.
