logistic_regression_classifier
Logistic regression classifier supporting both binary and multiclass classification. Multiclass classification is implemented using batch gradient descent to train a single multiclass softmax model. Binary classification is treated as a two-class special case of the same objective.
The library implements the classifier_protocol defined in the
classification_protocols library. It provides predicates for
learning a classifier from a dataset object, using it to make
predictions, returning class probabilities, and exporting the learned
model as a list of predicate clauses or to a file.
Datasets are represented as objects implementing the
dataset_protocol protocol from the classification_protocols
library. Existing datasets in classification_protocols/test_datasets
can be used for binary, multiclass, continuous, categorical, and
mixed-feature testing.
API documentation
Open the ../../docs/library_index.html#logistic_regression_classifier link in a web browser.
Loading
To load this library, load the loader.lgt file:
| ?- logtalk_load(logistic_regression_classifier(loader)).
Testing
To test this library predicates, load the tester.lgt file:
| ?- logtalk_load(logistic_regression_classifier(tester)).
To run the performance benchmark suite, load the
tester_performance.lgt file:
| ?- logtalk_load(logistic_regression_classifier(tester_performance)).
Features
Binary and Multiclass Classification: Learns a joint softmax logistic model with one parameter vector per class.
Continuous Features: Standardizes numeric attributes using z-score scaling derived from the training data.
Categorical Features: Expands discrete attributes using one-hot encoding based on the declared dataset attribute values and rejects unseen values with a domain error.
Missing Values: Encodes missing numeric and categorical values represented using anonymous variables using explicit missing-value indicator features instead of being conflated with baseline feature values.
Unknown values: Prediction requests containing categorical values that are not declared by the dataset raise a domain error instead of being silently mapped into an existing feature bucket.
Probability Estimation: Provides class probability distributions in addition to class predictions.
Classifier Export: Learned classifiers can be exported as predicate clauses or written to a file.
Reference Benchmarks: Includes a dedicated performance suite covering the
weather,mixed,iris_small,missing_mixed, andbreast_cancerdatasets with reported training time, training accuracy, and mean log loss.
Options
The learn/3 predicate supports these options:
learning_rate/1- gradient descent learning rate (default:0.1)maximum_iterations/1- maximum number of optimization iterations (default:1000)tolerance/1- convergence threshold for the maximum parameter update (default:1.0e-6)l2_regularization/1- L2 penalty factor applied to weights (default:0.0)
Usage
Learning a classifier
| ?- logistic_regression_classifier::learn(weather, Classifier).
| ?- logistic_regression_classifier::learn(iris_small, Classifier, [learning_rate(0.05), maximum_iterations(1500)]).
Making predictions
| ?- logistic_regression_classifier::learn(mixed, Classifier),
logistic_regression_classifier::predict(Classifier, [age-45, income-75000, student-no, credit_rating-fair], Class).
| ?- logistic_regression_classifier::learn(iris_small, Classifier),
logistic_regression_classifier::predict_probabilities(Classifier, [sepal_length-6.4, sepal_width-3.0, petal_length-5.8, petal_width-2.2], Probabilities).
Exporting the classifier
| ?- logistic_regression_classifier::learn(weather, Classifier),
logistic_regression_classifier::export_to_clauses(weather, Classifier, classify, Clauses).
| ?- logistic_regression_classifier::learn(weather, Classifier),
logistic_regression_classifier::export_to_file(weather, Classifier, classify, 'classifier.pl').
Classifier representation
The learned classifier is represented as a compound term with the form:
lr_classifier(Classes, Encoders, Models, Options)
Where:
Classes: list of class labelsEncoders: list of continuous scaling descriptors and categorical value listsModels: list ofclass_model(Class, Bias, Weights)termsOptions: merged training options used to learn the model
When exported using export_to_clauses/4 or export_to_file/4,
this classifier term is serialized directly as the single argument of
the generated predicate clause so that the exported model can be loaded
and reused as-is.
References
Hosmer, D.W., Lemeshow, S. and Sturdivant, R.X. (2013). “Applied Logistic Regression”.
Bishop, C.M. (2006). “Pattern Recognition and Machine Learning”. Chapter 4.
Hastie, T., Tibshirani, R. and Friedman, J. (2009). “The Elements of Statistical Learning”. Chapter 4.