random_forest_classifier

Random Forest classifier using C4.5 decision trees as base learners. Builds an ensemble of decision trees trained on bootstrap samples with random feature subsets and combines their predictions through majority voting.

The library implements the classifier_protocol defined in the classification_protocols library. It provides predicates for learning an ensemble classifier from a dataset, using it to make predictions (with class probabilities), and exporting it as a list of predicate clauses or to a file.

Datasets are represented as objects implementing the dataset_protocol protocol from the classification_protocols library. See test_files directory for examples.

API documentation

Open the ../../docs/library_index.html#random_forest_classifier link in a web browser.

Loading

To load all entities in this library, load the loader.lgt file:

| ?- logtalk_load(random_forest_classifier(loader)).

Testing

To test this library predicates, load the tester.lgt file:

| ?- logtalk_load(random_forest_classifier(tester)).

Features

  • Ensemble Learning: Combines multiple C4.5 decision trees for robust predictions.

  • Bootstrap Sampling: Each tree is trained on a bootstrap sample (random sample with replacement) of the training data.

  • Feature Randomization: Random subset of features selected for each tree (default: sqrt(TotalFeatures)).

  • Portable Seeded Sampling: Uses fast_random(xoshiro128pp) so bootstrap sampling and feature subset selection are portable and reproducible.

  • Majority Voting: Final predictions determined by voting across all trees.

  • Probability Estimation: Provides confidence scores based on vote proportions.

  • Configurable Options: Number of trees, maximum features per tree, and random seed via predicate options.

  • Classifier Export: Learned classifiers can be exported as predicate clauses.

Options

The following options can be passed to the learn/3 predicate:

  • number_of_trees(N): Number of trees in the forest (default: 10).

  • maximum_features_per_tree(N): Maximum number of features to consider per tree (default: sqrt(TotalFeatures)).

  • random_seed(N): Positive integer seed used by the portable fast_random(xoshiro128pp) pseudo-random generator when drawing bootstrap samples and random feature subsets. Using the same seed with the same dataset and options reproduces the same learned classifier (default: 1357911).

Classifier representation

The learned classifier is represented as a compound term:

rf_classifier(Trees, ClassValues, Options)

Where:

  • Trees: List of tree(C45Tree, AttributeNames) pairs

  • ClassValues: List of possible class values

  • Options: List of options used during learning

When exported using export_to_clauses/4 or export_to_file/4, this classifier term is serialized directly as the single argument of the generated predicate clause so that the exported model can be loaded and reused as-is.

References

  1. Breiman, L. (2001). “Random Forests”. Machine Learning, 45(1), 5-32.

  2. Ho, T.K. (1995). “Random Decision Forests”. Proceedings of the 3rd International Conference on Document Analysis and Recognition.

  3. Quinlan, J.R. (1993). “C4.5: Programs for Machine Learning”. Morgan Kaufmann.

Usage

Learning a Classifier

% Learn a random forest with default options (10 trees)
| ?- random_forest_classifier::learn(play_tennis, Classifier).
...

% Learn with custom options
| ?- random_forest_classifier::learn(play_tennis, Classifier, [number_of_trees(20), maximum_features_per_tree(2), random_seed(17)]).
...

Making Predictions

% Predict class for a new instance
| ?- random_forest_classifier::learn(play_tennis, Classifier),
     random_forest_classifier::predict(Classifier, [outlook-sunny, temperature-hot, humidity-high, wind-weak], Class).
Class = no
...

% Get probability distribution from ensemble voting
| ?- random_forest_classifier::learn(play_tennis, Classifier),
     random_forest_classifier::predict_probabilities(Classifier, [outlook-overcast, temperature-mild, humidity-normal, wind-weak], Probabilities).
Probabilities = [yes-0.9, no-0.1]
...

Exporting the Classifier

% Export as predicate clauses
| ?- random_forest_classifier::learn(play_tennis, Classifier),
     random_forest_classifier::export_to_clauses(play_tennis, Classifier, my_forest, Clauses).
Clauses = [my_forest(random_forest_classifier(...))]
...

% Export to a file
| ?- random_forest_classifier::learn(play_tennis, Classifier),
     random_forest_classifier::export_to_file(play_tennis, Classifier, my_forest, 'forest.pl').
...

Using a Saved Classifier

% Load and use a previously saved classifier
| ?- logtalk_load('forest.pl'),
     my_forest(Classifier),
     random_forest_classifier::predict(Classifier, [outlook-sunny, temperature-cool, humidity-normal, wind-weak], Class).
Class = yes
...

Printing the Classifier

% Print a summary of the random forest
| ?- random_forest_classifier::learn(play_tennis, Classifier),
     random_forest_classifier::print_classifier(Classifier).

Random Forest Classifier
========================

Number of trees: 10
Class values: [yes,no]
Options: [number_of_trees(10)]

Trees:
  Tree 1 (features: [outlook,humidity]):
    -> tree rooted at outlook
  ...
...