ridge_regression

Ridge regression regressor supporting continuous and mixed-feature datasets. The library implements the regressor_protocol defined in the regression_protocols library and learns a linear model by solving the weighted ridge normal equations directly via the shared regression encoding core in regressor_common, leaving the intercept unpenalized while penalizing encoded feature columns using scale-aware weights that match standardizing penalized columns before applying the L2 penalty.

API documentation

Open the ../../apis/library_index.html#ridge_regression link in a web browser.

Loading

To load this library, load the loader.lgt file:

| ?- logtalk_load(ridge_regression(loader)).

Testing

To test this library predicates, load the tester.lgt file:

| ?- logtalk_load(ridge_regression(tester)).

To run the performance benchmark suite, load the tester_performance.lgt file:

| ?- logtalk_load(ridge_regression(tester_performance)).

Features

  • Continuous and Mixed Features: Supports continuous attributes and categorical attributes encoded using reference-level dummy coding.

  • Feature Scaling and Penalty Scaling: Continuous attributes can be standardized using z-score scaling. Ridge regularization uses scale-aware weights equivalent to standardizing each penalized encoded feature column before applying the L2 penalty.

  • Missing Values: Missing numeric and categorical values represented using anonymous variables are encoded using explicit missing-value indicator features.

  • Unknown Values: Prediction requests containing categorical values that are not declared by the dataset raise a domain error.

  • Zero-Variance Features: Encoded columns with zero variance are excluded from the direct solve and assigned zero coefficients in the learned regressor.

  • Ridge Penalty: Applies L2 regularization to the learned weights using the shared regularization/1 option.

  • Diagnostics Metadata: Learned regressors record model name, target, training example count, solver, linear-system residual, active feature count, penalty scaling strategy, encoded feature count, and effective options, accessible using the shared regression diagnostics predicates.

  • Model Export: Learned regressors can be exported as predicate clauses or written to a file.

  • Reference Benchmarks: Includes a dedicated performance suite reporting training time, RMSE, and MAE for representative regression datasets.

Regressor representation

The learned regressor is represented by default as:

  • ridge_regressor(Encoders, Bias, Weights, Diagnostics)

The exported predicate clauses therefore use the shape:

  • Functor(Encoders, Bias, Weights, Diagnostics)

Diagnostics syntax

The diagnostics/2 predicate returns a list of metadata terms with the form:

[
    model(ridge_regression),
    target(Target),
    training_example_count(TrainingExampleCount),
    options(Options),
    solver(Solver),
    linear_system_residual(Residual),
    active_feature_count(ActiveFeatureCount),
    penalty_scaling(encoded_feature_standardization),
    encoded_feature_count(FeatureCount)
]

Where:

  • model(ridge_regression) identifies the learning algorithm that produced the regressor.

  • target(Target) stores the target attribute name declared by the training dataset.

  • training_example_count(TrainingExampleCount) stores the number of examples used during training.

  • options(Options) stores the effective learning options after merging the user options with the library defaults.

  • solver(Solver) records the direct linear-system solver used during training. The current value is pivoted_gaussian_elimination.

  • linear_system_residual(Residual) stores the maximum absolute residual of the solved ridge linear system.

  • active_feature_count(ActiveFeatureCount) stores the number of encoded feature columns retained for the direct solve after dropping zero-variance columns.

  • penalty_scaling(encoded_feature_standardization) records that the ridge penalty is scaled as if each penalized encoded feature column had been standardized before applying the L2 penalty.

  • encoded_feature_count(FeatureCount) stores the number of numeric features induced by the encoder list, including missing-value indicator features.

Use the regression_protocols diagnostic/2 and regressor_options/2 helper predicates when you only need a single metadata term or the effective options.

Options

The learn/3 predicate accepts the following options:

  • regularization/1: Ridge penalty coefficient applied to the weight vector during the direct solve. Higher values increase shrinkage and can reduce overfitting. The default is 0.01.

  • feature_scaling/1: Controls z-score standardization of continuous attributes before training and prediction. Accepted values are true and false. The default is true.