anomaly_detection_protocols
This library provides protocols used in the implementation of machine
learning anomaly-detection algorithms. Datasets are represented as
objects implementing the anomaly_dataset_protocol protocol. Anomaly
detectors are represented as objects importing the
anomaly_detector_common category which imports the
anomaly_detector_protocol protocol. The category provides shared
learn/2, predict/3-4, diagnostics/2, diagnostic/2,
anomaly_detector_options/2, file export, baseline training-selection
helpers, and dataset helper predicates. It keeps threshold-based
prediction and export behavior separate from the algorithm-specific
learning, scoring, clause export, pretty-printing, and diagnostics
metadata code.
The shared category also provides reusable protected predicates for
baseline-only training workflows. Libraries can use the
baseline_class_values/1 and baseline_selection_policy/1 options
via a single helper instead of reimplementing class-label validation and
baseline-example filtering or rejection logic locally.
Learned detector terms can be validated explicitly using the shared
check_anomaly_detector/1 and valid_anomaly_detector/1
predicates. This validation API is never called implicitly by scoring,
prediction, printing, or export predicates.
This library also provides a reusable shared category, anomaly benchmark datasets, and a small family smoke-test suite.
Export header format
The shared exporter in the anomaly_detector_common category writes a
header before the exported clauses in the following format:
% exported anomaly detector predicate: Functor/Arity
% training dataset: Dataset
% options: Options
% Functor(Detector)
Functor(Detector)
The exported clauses serialize the learned detector term as a single
predicate argument so that loading the file gives a detector term that
can be passed directly to the predict/3-4 and score_all/3
predicates.
When exporting a serialized detector term, using a noun such as
detector/1 or model/1 is recommended.
API documentation
Open the ../../apis/library_index.html#anomaly-detection-protocols link in a web browser.
Loading
To load all entities in this library, load the loader.lgt file:
| ?- logtalk_load(anomaly_detection_protocols(loader)).
Testing
To run the library smoke tests, shared category tests, and dataset
checks, load the tester.lgt file:
| ?- logtalk_load(anomaly_detection_protocols(tester)).
Test datasets
Several sample datasets are included in the test_datasets directory:
gaussian_anomalies.lgt— A synthetic 2D anomaly detection dataset with 48 examples and 2 continuous attributes (x, y). Normal points are sampled from a standard normal distribution centered at the origin. Anomalous points are placed far from the cluster center. Inspired by the canonical test case used in the Extended Isolation Forest paper by Hariri et al. (2019).malformed_anomalies.lgt— A negative fixture with invalid class labels for testing family-level dataset validation.mixed_anomalies.lgt— A small mixed-feature anomaly dataset with 16 examples, 2 continuous attributes (age, income), and 2 categorical attributes (student, credit_rating). Includes missing values and uncommon feature combinations to exercise anomaly-detection code on heterogeneous data.mixed_distance_behaviors.lgt— A compact mixed-feature anomaly fixture with 8 examples, 2 continuous attributes (size, weight), and 2 categorical attributes (color, shape). Intended for smoke-testing continuous plus categorical distance behavior and basic mixed-data handling.sensor_anomalies.lgt— A synthetic industrial sensor anomaly dataset with 40 examples and 3 continuous attributes (temperature, pressure, vibration). Contains missing values (14 examples with missing values, represented using anonymous variables). Normal readings cluster around typical operating ranges. Anomalous readings show extreme values indicating equipment malfunction.shuttle_anomalies.lgt— A subset of the Statlog Shuttle dataset with 50 examples and 9 continuous attributes representing sensor readings from the NASA Space Shuttle. Class 1 (Rad Flow) is the majority class (normal), while all other classes are treated as anomalies. Originally from Catlett, J. (1991). Available from the UCI Machine Learning Repository: https://archive.ics.uci.edu/dataset/148/statlog+shuttlewater_potability.lgt— A water potability dataset with 48 examples and 9 continuous attributes (pH, hardness, solids, chloramines, sulfate, conductivity, organic carbon, trihalomethanes, turbidity). Normal instances represent potable water samples within acceptable ranges. Anomalous instances represent water samples with hazardous contamination levels. Based on the publicly available Water Quality dataset (Kadiwal, A., 2020, Kaggle).