.. _library_ridge_regression:

``ridge_regression``
====================

Ridge regression regressor supporting continuous and mixed-feature
datasets. The library implements the ``regressor_protocol`` defined in
the ``regression_protocols`` library and learns a linear model by
solving the weighted ridge normal equations directly via the shared
regression encoding core in ``regressor_common``, leaving the intercept
unpenalized while penalizing encoded feature columns using scale-aware
weights that match standardizing penalized columns before applying the
L2 penalty.

API documentation
-----------------

Open the
`../../apis/library_index.html#ridge_regression <../../apis/library_index.html#ridge_regression>`__
link in a web browser.

Loading
-------

To load this library, load the ``loader.lgt`` file:

::

   | ?- logtalk_load(ridge_regression(loader)).

Testing
-------

To test this library predicates, load the ``tester.lgt`` file:

::

   | ?- logtalk_load(ridge_regression(tester)).

To run the performance benchmark suite, load the
``tester_performance.lgt`` file:

::

   | ?- logtalk_load(ridge_regression(tester_performance)).

Features
--------

- **Continuous and Mixed Features**: Supports continuous attributes and
  categorical attributes encoded using reference-level dummy coding.

- **Feature Scaling and Penalty Scaling**: Continuous attributes can be
  standardized using z-score scaling. Ridge regularization uses
  scale-aware weights equivalent to standardizing each penalized encoded
  feature column before applying the L2 penalty.

- **Missing Values**: Missing numeric and categorical values represented
  using anonymous variables are encoded using explicit missing-value
  indicator features.

- **Unknown Values**: Prediction requests containing categorical values
  that are not declared by the dataset raise a domain error.

- **Zero-Variance Features**: Encoded columns with zero variance are
  excluded from the direct solve and assigned zero coefficients in the
  learned regressor.

- **Ridge Penalty**: Applies L2 regularization to the learned weights
  using the shared ``regularization/1`` option.

- **Diagnostics Metadata**: Learned regressors record model name,
  target, training example count, solver, linear-system residual, active
  feature count, penalty scaling strategy, encoded feature count, and
  effective options, accessible using the shared regression diagnostics
  predicates.

- **Model Export**: Learned regressors can be exported as predicate
  clauses or written to a file.

- **Reference Benchmarks**: Includes a dedicated performance suite
  reporting training time, RMSE, and MAE for representative regression
  datasets.

Regressor representation
------------------------

The learned regressor is represented by default as:

- ``ridge_regressor(Encoders, Bias, Weights, Diagnostics)``

The exported predicate clauses therefore use the shape:

- ``Functor(Encoders, Bias, Weights, Diagnostics)``

Diagnostics syntax
------------------

The ``diagnostics/2`` predicate returns a list of metadata terms with
the form:

::

   [
       model(ridge_regression),
       target(Target),
       training_example_count(TrainingExampleCount),
       options(Options),
       solver(Solver),
       linear_system_residual(Residual),
       active_feature_count(ActiveFeatureCount),
       penalty_scaling(encoded_feature_standardization),
       encoded_feature_count(FeatureCount)
   ]

Where:

- ``model(ridge_regression)`` identifies the learning algorithm that
  produced the regressor.
- ``target(Target)`` stores the target attribute name declared by the
  training dataset.
- ``training_example_count(TrainingExampleCount)`` stores the number of
  examples used during training.
- ``options(Options)`` stores the effective learning options after
  merging the user options with the library defaults.
- ``solver(Solver)`` records the direct linear-system solver used during
  training. The current value is ``pivoted_gaussian_elimination``.
- ``linear_system_residual(Residual)`` stores the maximum absolute
  residual of the solved ridge linear system.
- ``active_feature_count(ActiveFeatureCount)`` stores the number of
  encoded feature columns retained for the direct solve after dropping
  zero-variance columns.
- ``penalty_scaling(encoded_feature_standardization)`` records that the
  ridge penalty is scaled as if each penalized encoded feature column
  had been standardized before applying the L2 penalty.
- ``encoded_feature_count(FeatureCount)`` stores the number of numeric
  features induced by the encoder list, including missing-value
  indicator features.

Use the ``regression_protocols`` ``diagnostic/2`` and
``regressor_options/2`` helper predicates when you only need a single
metadata term or the effective options.

Options
-------

The ``learn/3`` predicate accepts the following options:

- ``regularization/1``: Ridge penalty coefficient applied to the weight
  vector during the direct solve. Higher values increase shrinkage and
  can reduce overfitting. The default is ``0.01``.
- ``feature_scaling/1``: Controls z-score standardization of continuous
  attributes before training and prediction. Accepted values are
  ``true`` and ``false``. The default is ``true``.
