classification_protocols
This library provides protocols used in the implementation of machine
learning classifier algorithms. Datasets are represented as objects
implementing the dataset_protocol protocol. Classifiers are
represented as objects implementing the classifier_protocol
protocol.
This library also provides reusable shared categories, smoke tests, and test datasets. See below for details.
Logtalk currently provides several classifiers including
c45_classifier, knn_classifier, linear_svm_classifier,
logistic_regression_classifier, naive_bayes_classifier,
nearest_centroid_classifier, and random_forest_classifier. See
these libraries documentation for details.
Diagnostics
The classifier_common category provides shared accessor predicates
such as diagnostics/2, diagnostic/2, and
classifier_options/2. These predicates make it possible to inspect
learned-classifier metadata without depending on the exact term
representation used by a particular classifier implementation.
The detailed contents of the diagnostics data are classifier-dependent. For example, some classifiers report effective training options, while others report structural metadata such as attribute names, feature types, or the number of training examples or models.
Export header format
The shared classifier exporter in the classifier_common category
writes a header before the exported clauses in the following format:
% exported classifier predicate: Functor/Arity
% training dataset: Dataset
% dataset prediction schema: Functor(Attribute1, ..., AttributeN, Class)
% diagnostics: Diagnostics
% Functor(Classifier)
Functor(Classifier)
The dataset prediction schema line always uses an ASCII-only title
case conversion for the attribute names and class. This line documents
the dataset-level prediction interface for readability, even when the
exported clauses serialize a model term instead of an executable
predictor relation.
When exporting a serialized classifier term, using a noun such as
classifier/1 or model/1 is recommended.
API documentation
Open the ../../apis/library_index.html#classification_protocols link in a web browser.
Loading
To load all entities in this library, load the loader.lgt file:
| ?- logtalk_load(classification_protocols(loader)).
Testing
To test this library predicates, shared categories, and datasets, load
the tester.lgt file:
| ?- logtalk_load(classification_protocols(tester)).
Test datasets
Several sample datasets are included in the test_datasets directory:
play_tennis.lgt— The classic weather/tennis dataset with 14 examples and 4 discrete attributes (outlook, temperature, humidity, wind). Originally from Quinlan (1986) and widely used in machine learning textbooks including Mitchell (1997). Also available from the UCI Machine Learning Repository: https://archive.ics.uci.edu/dataset/349/tennis+major+tournament+match+statisticscontact_lenses.lgt— A dataset with 24 examples and 4 discrete attributes (age, spectacle prescription, astigmatism, tear production rate) for deciding the type of contact lenses to prescribe. Originally from Cendrowska, J. (1987). PRISM: An algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27(4), 349-370. Available from the UCI Machine Learning Repository: https://archive.ics.uci.edu/dataset/58/lensesiris.lgt— The classic Iris flower dataset with 150 examples and 4 continuous attributes (sepal length, sepal width, petal length, petal width) for classifying iris species (setosa, versicolor, virginica). Originally from Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179-188. Available from the UCI Machine Learning Repository: https://archive.ics.uci.edu/dataset/53/irisbreast_cancer.lgt— A dataset with 286 examples and 9 discrete attributes (age, menopause, tumor size, inv-nodes, node-caps, degree of malignancy, breast, breast quadrant, irradiation) for predicting breast cancer recurrence events. Contains missing values (9 examples with missing values in the node-caps and breast-quad attributes, represented using anonymous variables). Originally from the Institute of Oncology, University Medical Centre, Ljubljana, Yugoslavia. Donors: Ming Tan and Jeff Schlimmer. Available from the UCI Machine Learning Repository: https://archive.ics.uci.edu/dataset/14/breast+cancer