bayesian_ridge_regression
Bayesian ridge regression regressor supporting continuous and
mixed-feature datasets. The library implements the
regressor_protocol defined in the regression_protocols library
and learns a Bayesian linear model using evidence maximization for the
global weight and noise precisions together with Gamma hyperpriors over
both precision terms.
API documentation
Open the ../../apis/library_index.html#bayesian_ridge_regression link in a web browser.
Loading
To load this library, load the loader.lgt file:
| ?- logtalk_load(bayesian_ridge_regression(loader)).
Testing
To test this library predicates, load the tester.lgt file:
| ?- logtalk_load(bayesian_ridge_regression(tester)).
To run the performance benchmark suite, load the
tester_performance.lgt file:
| ?- logtalk_load(bayesian_ridge_regression(tester_performance)).
Features
Continuous and Mixed Features: Supports continuous attributes and categorical attributes encoded using reference-level dummy coding from the declared dataset attribute values.
Automatic Hyperparameter Tuning: Learns the global coefficient precision and observation-noise precision using MacKay-style evidence maximization with configurable Gamma hyperpriors instead of a user-supplied ridge penalty.
Posterior Uncertainty: Exposes predictive Gaussian distributions using coefficient posterior uncertainty plus observation noise, matching the usual scikit-learn BayesianRidge treatment where the intercept is not probabilistic. Posterior coefficient variances are also exposed.
Feature Scaling: Continuous attributes can be standardized using z-score scaling before fitting and prediction.
Stable Posterior Solves: Evidence-maximization updates clamp the learned weight and noise precisions to configurable
precision_bounds(Min, Max)to avoid degenerate zero or infinite precision estimates. Posterior solves use Cholesky factorization of positive-definite precision matrices, diagnostics report any diagonal jitter applied when factorization retries are needed, and the evidence-maximization loop computes the effective degrees of freedom from a one-time eigenspectrum of the centered Gram surrogate while still switching to a sample-space solve when the active encoded feature count exceeds the number of training rows.Missing Values: Missing numeric and categorical values represented using anonymous variables are encoded using explicit missing-value indicator features.
Unknown Values: Prediction requests containing categorical values that are not declared by the dataset raise a domain error.
Zero-Variance Features: Encoded columns with zero variance are excluded from posterior updates and assigned zero mean and zero posterior variance.
Diagnostics Metadata: Learned regressors record model name, target, training example count, Cholesky stabilization attempts and applied jitter, Gamma hyperpriors for both precision terms, effective precision bounds, learned precisions, learned noise variance, final log evidence, the full log-evidence score trace, active feature count, posterior variances, intercept treatment, convergence metric and status, encoded feature count, and effective options.
Model Export: Learned regressors can be exported as predicate clauses or written to a file.
Regressor representation
The learned regressor is represented by default as:
bayesian_ridge_regressor(Encoders, Bias, Weights, ActiveFlags, PosteriorCovariance, NoiseVariance, Diagnostics)
The exported predicate clauses therefore use the shape:
Functor(Encoders, Bias, Weights, ActiveFlags, PosteriorCovariance, NoiseVariance, Diagnostics)
Diagnostics syntax
The diagnostics/2 predicate returns a list of metadata terms with
the form:
[
model(bayesian_ridge_regression),
target(Target),
training_example_count(TrainingExampleCount),
options(Options),
solver(cholesky_factorization),
stabilization_attempts(StabilizationAttempts),
stabilization_jitter(StabilizationJitter),
precision_bounds(MinimumPrecision, MaximumPrecision),
weight_precision_hyperprior(gamma(LambdaShape, LambdaRate)),
noise_precision_hyperprior(gamma(AlphaShape, AlphaRate)),
weight_precision(Alpha),
noise_precision(Beta),
noise_variance(NoiseVariance),
log_evidence(LogEvidence),
scores(Scores),
active_feature_count(ActiveFeatureCount),
weight_prior(isotropic_zero_mean_gaussian),
intercept_treatment(non_probabilistic),
bias_variance(BiasVariance),
weight_variances(WeightVariances),
convergence_metric(coefficient_l1),
convergence(Convergence),
iterations(Iterations),
final_delta(FinalDelta),
encoded_feature_count(FeatureCount)
]
The scores/1 diagnostic is analogous to scikit-learn scores_: it
stores the log marginal likelihood at the initial hyperparameters
followed by the value after each evidence-maximization update. The final
element is identical to log_evidence/1.
The bias_variance/1 diagnostic is always 0.0 because the
intercept is treated as a deterministic centering adjustment rather than
as a probabilistic parameter.
Use the regression_protocols diagnostic/2 and
regressor_options/2 helper predicates when you only need a single
metadata term or the effective options.
Options
The learn/3 predicate accepts the following options:
maximum_iterations/1: Maximum number of evidence-maximization updates. The default is300.tolerance/1: Convergence tolerance on the L1 change between consecutive active coefficient vectors across evidence-maximization updates. The default is1.0e-6.initial_weight_precision/1: Positive initial value for the shared coefficient precision. The default is1.0.initial_noise_precision/1: Positive initial value for the observation-noise precision orautoto derive it from the target variance. The default isauto.alpha_1/1: Non-negative shape hyperparameter of the Gamma prior over the learned observation-noise precision. The default is1.0e-6.alpha_2/1: Non-negative rate hyperparameter of the Gamma prior over the learned observation-noise precision. The default is1.0e-6.lambda_1/1: Non-negative shape hyperparameter of the Gamma prior over the learned coefficient precision. The default is1.0e-6.lambda_2/1: Non-negative rate hyperparameter of the Gamma prior over the learned coefficient precision. The default is1.0e-6.feature_scaling/1: Controls z-score standardization of continuous attributes before training and prediction. Accepted values aretrueandfalse. The default istrue.precision_bounds/2: Lower and upper positive bounds used to clamp the learned weight and noise precisions during evidence maximization for numerical stability. The default isprecision_bounds(1.0e-12, 1.0e12).