.. _library_logistic_regression_classifier:

``logistic_regression_classifier``
==================================

Logistic regression classifier supporting both binary and multiclass
classification. Multiclass classification is implemented using batch
gradient descent to train a single multiclass softmax model. Binary
classification is treated as a two-class special case of the same
objective.

The library implements the ``classifier_protocol`` defined in the
``classification_protocols`` library. It provides predicates for
learning a classifier from a dataset object, using it to make
predictions, returning class probabilities, and exporting the learned
model as a list of predicate clauses or to a file.

Datasets are represented as objects implementing the
``dataset_protocol`` protocol from the ``classification_protocols``
library. Existing datasets in ``classification_protocols/test_datasets``
can be used for binary, multiclass, continuous, categorical, and
mixed-feature testing.

API documentation
-----------------

Open the
`../../docs/library_index.html#logistic_regression_classifier <../../docs/library_index.html#logistic_regression_classifier>`__
link in a web browser.

Loading
-------

To load this library, load the ``loader.lgt`` file:

::

   | ?- logtalk_load(logistic_regression_classifier(loader)).

Testing
-------

To test this library predicates, load the ``tester.lgt`` file:

::

   | ?- logtalk_load(logistic_regression_classifier(tester)).

To run the performance benchmark suite, load the
``tester_performance.lgt`` file:

::

   | ?- logtalk_load(logistic_regression_classifier(tester_performance)).

Features
--------

- **Binary and Multiclass Classification**: Learns a joint softmax
  logistic model with one parameter vector per class.
- **Continuous Features**: Standardizes numeric attributes using z-score
  scaling derived from the training data.
- **Categorical Features**: Expands discrete attributes using one-hot
  encoding based on the declared dataset attribute values and rejects
  unseen values with a domain error.
- **Missing Values**: Encodes missing numeric and categorical values
  represented using anonymous variables using explicit missing-value
  indicator features instead of being conflated with baseline feature
  values.
- **Unknown values**: Prediction requests containing categorical values
  that are not declared by the dataset raise a domain error instead of
  being silently mapped into an existing feature bucket.
- **Probability Estimation**: Provides class probability distributions
  in addition to class predictions.
- **Classifier Export**: Learned classifiers can be exported as
  predicate clauses or written to a file.
- **Reference Benchmarks**: Includes a dedicated performance suite
  covering the ``weather``, ``mixed``, ``iris_small``,
  ``missing_mixed``, and ``breast_cancer`` datasets with reported
  training time, training accuracy, and mean log loss.

Options
-------

The ``learn/3`` predicate supports these options:

- ``learning_rate/1`` - gradient descent learning rate (default:
  ``0.1``)
- ``maximum_iterations/1`` - maximum number of optimization iterations
  (default: ``1000``)
- ``tolerance/1`` - convergence threshold for the maximum parameter
  update (default: ``1.0e-6``)
- ``l2_regularization/1`` - L2 penalty factor applied to weights
  (default: ``0.0``)

Usage
-----

Learning a classifier
~~~~~~~~~~~~~~~~~~~~~

::

   | ?- logistic_regression_classifier::learn(weather, Classifier).

   | ?- logistic_regression_classifier::learn(iris_small, Classifier, [learning_rate(0.05), maximum_iterations(1500)]).

Making predictions
~~~~~~~~~~~~~~~~~~

::

   | ?- logistic_regression_classifier::learn(mixed, Classifier),
        logistic_regression_classifier::predict(Classifier, [age-45, income-75000, student-no, credit_rating-fair], Class).

   | ?- logistic_regression_classifier::learn(iris_small, Classifier),
        logistic_regression_classifier::predict_probabilities(Classifier, [sepal_length-6.4, sepal_width-3.0, petal_length-5.8, petal_width-2.2], Probabilities).

Exporting the classifier
~~~~~~~~~~~~~~~~~~~~~~~~

::

   | ?- logistic_regression_classifier::learn(weather, Classifier),
        logistic_regression_classifier::export_to_clauses(weather, Classifier, classify, Clauses).

   | ?- logistic_regression_classifier::learn(weather, Classifier),
        logistic_regression_classifier::export_to_file(weather, Classifier, classify, 'classifier.pl').

Classifier representation
-------------------------

The learned classifier is represented as a compound term with the form:

::

   lr_classifier(Classes, Encoders, Models, Options)

Where:

- ``Classes``: list of class labels
- ``Encoders``: list of continuous scaling descriptors and categorical
  value lists
- ``Models``: list of ``class_model(Class, Bias, Weights)`` terms
- ``Options``: merged training options used to learn the model

When exported using ``export_to_clauses/4`` or ``export_to_file/4``,
this classifier term is serialized directly as the single argument of
the generated predicate clause so that the exported model can be loaded
and reused as-is.

References
----------

1. Hosmer, D.W., Lemeshow, S. and Sturdivant, R.X. (2013). "Applied
   Logistic Regression".
2. Bishop, C.M. (2006). "Pattern Recognition and Machine Learning".
   Chapter 4.
3. Hastie, T., Tibshirani, R. and Friedman, J. (2009). "The Elements of
   Statistical Learning". Chapter 4.
