lasso_regression
Lasso regression regressor supporting continuous and mixed-feature datasets. Uses cyclic coordinate descent with soft-thresholding updates for each encoded feature in order to minimize mean squared error plus an L1 penalty on the encoded coefficient vector.
The library implements the regressor_protocol defined in the
regression_protocols library and learns a linear model using cyclic
coordinate descent with L1 regularization and soft-thresholding updates
for each encoded feature in order to minimize mean squared error plus an
L1 penalty on the encoded coefficient vector.
API documentation
Open the ../../apis/library_index.html#lasso_regression link in a web browser.
Loading
To load this library, load the loader.lgt file:
| ?- logtalk_load(lasso_regression(loader)).
Testing
To test this library predicates, load the tester.lgt file:
| ?- logtalk_load(lasso_regression(tester)).
To run the performance benchmark suite, load the
tester_performance.lgt file:
| ?- logtalk_load(lasso_regression(tester_performance)).
Features
Continuous and Mixed Features: Supports continuous attributes and categorical attributes
Categorical Attributes Encoding: Uses reference-level dummy coding derived from the declared dataset attribute values, with a missing-value indicator, and the resulting encoded coefficients are regularized independently.
Feature Scaling: Continuous attributes can be standardized using z-score scaling.
Missing Values: Missing numeric and categorical values represented using anonymous variables are encoded using explicit missing-value indicator features.
Unknown Values: Prediction requests containing categorical values that are not declared by the dataset raise a domain error.
Coefficient-wise L1 Shrinkage: Applies soft-thresholding updates independently to every encoded feature, including categorical dummy and missing-indicator features.
Diagnostics Metadata: Learned regressors record model name, target, training example count, optimization stop reason, completed iterations, final parameter delta, encoded feature count, and effective options, accessible using the shared regression diagnostics predicates.
Model Export: Learned regressors can be exported as predicate clauses or written to a file.
Reference Benchmarks: Includes a dedicated performance suite reporting training time, RMSE, and MAE for representative regression datasets.
Regressor representation
The learned regressor is represented by default as:
lasso_regressor(Encoders, Bias, Weights, Diagnostics)
The exported predicate clauses therefore use the shape:
Functor(Encoders, Bias, Weights, Diagnostics)
In this representation, Encoders stores the feature encoding
metadata, Bias stores the intercept, Weights stores one
coefficient per encoded feature, and Diagnostics stores training
metadata including the effective options.
Diagnostics syntax
The diagnostics/2 predicate returns a list of metadata terms with
the form:
[
model(lasso_regression),
target(Target),
training_example_count(TrainingExampleCount),
options(Options),
convergence(Status),
iterations(Iterations),
final_delta(FinalDelta),
encoded_feature_count(FeatureCount)
]
Where:
model(lasso_regression)identifies the learning algorithm that produced the regressor.target(Target)stores the target attribute name declared by the training dataset.training_example_count(TrainingExampleCount)stores the number of examples used during training.options(Options)stores the effective learning options after merging the user options with the library defaults.convergence(Status)records the optimization stop condition. The current values aretolerancewhen the maximum Karush-Kuhn-Tucker optimality violation across the intercept and all encoded features is within the configured tolerance andmaximum_iterations_exhaustedwhen training stops because the iteration cap is reached.iterations(Iterations)stores the number of coordinate-descent sweeps completed during training.final_delta(FinalDelta)stores the maximum Karush-Kuhn-Tucker optimality violation measured during the final optimization check.encoded_feature_count(FeatureCount)stores the number of numeric features induced by the encoder list, including missing-value indicator features.
Use the regression_protocols diagnostic/2 and
regressor_options/2 helper predicates when you only need a single
metadata term or the effective options.
Options
The learn/3 predicate accepts the following options:
maximum_iterations/1: Maximum number of coordinate-descent sweeps to run before stopping even if the tolerance criterion has not been met. The default is2000.tolerance/1: Convergence threshold for the maximum Karush-Kuhn-Tucker optimality violation in a full coordinate-descent sweep. Training stops early when both the intercept condition and all encoded-feature subgradient conditions are satisfied within this value. The default is1.0e-7.regularization/1: L1 penalty coefficient applied independently to every encoded feature during optimization. Higher values increase shrinkage and can reduce overfitting. The default is0.01.feature_scaling/1: Controls z-score standardization of continuous attributes before training and prediction. Accepted values aretrueandfalse. The default istrue.