cusum_anomaly_detector
CUSUM (Cumulative Sum Control Chart) anomaly detector for continuous sequence-like datasets. This is a statistical anomaly-detection method based on a two-sided CUSUM control chart. Declared continuous attributes are interpreted as ordered monitoring steps.
The library implements the anomaly_detector_protocol defined in the
anomaly_detection_protocols library. It learns a detector from a
continuous dataset, computes anomaly scores for new instances, predicts
normal or anomaly, and exports learned detectors as clauses or
files.
Datasets are represented as objects implementing the
anomaly_dataset_protocol protocol from the
anomaly_detection_protocols library. Declared continuous attributes
are interpreted as ordered monitoring steps in a sequence. See the
cusum_anomaly_detector/tests.lgt file for example datasets.
API documentation
Open the ../../apis/library_index.html#cusum_anomaly_detector link in a web browser.
Loading
To load this library, load the loader.lgt file:
| ?- logtalk_load(cusum_anomaly_detector(loader)).
Testing
To test this library predicates, load the tester.lgt file:
| ?- logtalk_load(cusum_anomaly_detector(tester)).
Features
Statistical method: implements anomaly detection based on a two-sided CUSUM control chart, using learned per-step population means and standard deviations for continuous attributes.
Ordered sequence interpretation: declared continuous attributes are treated as ordered monitoring steps. For each known step value
x_t, the library computesz_t = (x_t - mu_t) / sigma_tand updates the positive and negative CUSUM recurrences along that attribute order. The learned detector stores a precomputed attribute schema so that this ordering does not need to be rebuilt for every scoring call.CUSUM recurrences: the positive and negative cumulative sums are updated as
C+_t = max(0, C+_(t-1) + z_t - k)andC-_t = max(0, C-_(t-1) - z_t - k), wherekis the learn-time allowance. The raw anomaly score is the maximum excursion over all positive and negative cumulative sums.Continuous features only: accepts datasets whose declared attributes are all
continuous.Baseline training selection: supports learn-time
baseline_class_values(ClassValues)andbaseline_selection_policy(Policy)options. The default baseline class values are[normal]. The defaultrejectpolicy throws an error if any non-baseline training example is found. Thefilterpolicy removes non-baseline examples before fitting the baseline statistics.Missing-value tolerant: ignores missing values when fitting per-step statistics and skips them during scoring. Queries must still provide at least one known value.
Bounded scoring: maps the raw CUSUM excursion to
[0.0, 1.0)usingScore = Raw / (1 + Raw).CUSUM control parameters: supports learn-time
allowance/1anddecision_interval/1options. The defaultallowance(0.5)anddecision_interval(5.0)correspond to a common standardized CUSUM setup.Default threshold: the default
anomaly_threshold(0.8333333333333334)corresponds to the default raw decision interval5.0. If a customdecision_interval/1is passed tolearn/3without an explicitanomaly_threshold/1, the stored anomaly threshold is derived automatically asH / (1 + H).Learn-time control parameters:
allowance/1anddecision_interval/1are recorded in the learned detector and reused for subsequent scoring and prediction. Passing them topredict/4does not override the learned values. Onlyanomaly_threshold/1can be overridden at predict time.All-missing queries rejected: scoring and prediction throw a
domain_error(non_empty_known_values, AttributeNames)exception when every declared step is missing in the query.Featureless datasets rejected: datasets must declare at least one continuous feature; otherwise
learn/2-3throws adomain_error(non_empty_features, Dataset)exception.Detector export: learned detectors can be exported as predicate clauses.
Explicit validation and diagnostics: supports the shared
check_anomaly_detector/1,valid_anomaly_detector/1,diagnostics/2,diagnostic/2, andanomaly_detector_options/2predicates.
Options
The following options are supported by the public API:
anomaly_threshold(Threshold): Threshold forpredict/3-4(default:0.8333333333333334)allowance(Allowance): Learn-time CUSUM allowancek(default:0.5)baseline_class_values(ClassValues): Learn-time class labels that are admissible for baseline fitting (default:[normal])baseline_selection_policy(Policy): Learn-time handling of examples whose class is not listed inbaseline_class_values/1. Supported values arerejectandfilter(default:reject)decision_interval(DecisionInterval): Learn-time raw decision intervalH(default:5.0). If no explicitanomaly_threshold/1is passed tolearn/3, the stored threshold is derived from this value asH / (1 + H).
Detector representation
The learned detector is represented by default as:
cusum_detector(TrainingDataset, AttributeSchema, Encoders, Diagnostics)
Where:
TrainingDataset: training dataset object identifierAttributeSchema: precomputed attribute ordering metadata used to validate and reorder query step values efficiently during scoringEncoders: list ofcusum_encoder(Attribute, Mean, Scale)recordsDiagnostics: learned metadata terms includingmodel/1,training_dataset/1,attribute_names/1,feature_count/1,example_count/1, andoptions/1. Theexample_count/1value is the effective number of training examples after applying the selected baseline selection policy.
When exported using export_to_clauses/4 or export_to_file/4,
this detector term is serialized directly as the single argument of the
generated predicate clause so that the exported model can be loaded and
reused as-is.
Notes
Scoring has three stages. First, the detector computes one standardized
deviation z_t = (x_t - mu_t) / sigma_t for each known monitoring
step. Second, those deviations are processed sequentially using the
positive and negative CUSUM recurrences with the learned allowance/1
value. Third, the maximum raw excursion is mapped to the interval
[0.0, 1.0) using Score = Raw / (1 + Raw).
The allowance/1 option changes the CUSUM update rule itself by
controlling how much drift must accumulate before the chart grows.
Larger values make the detector less sensitive to small shifts. The
decision_interval/1 option does not change scoring; it only affects
the default threshold stored when learning a detector.
The baseline_class_values/1 option declares which dataset class
labels are admissible for baseline fitting. The
baseline_selection_policy/1 option then controls what happens when
other labels are present in the training data. The default reject
policy raises a domain_error(baseline_only_training_data, Dataset)
exception when any non-baseline example is found. The filter policy
removes non-baseline examples before fitting and raises a
domain_error(non_empty_baseline_training_data, Dataset) exception if
no training examples remain after filtering.
Attributes with zero observed dispersion are assigned a fallback scale
of 1.0. This keeps the detector well-defined for singleton datasets
or constant steps while still yielding zero score for matching values
and positive scores for deviating values.