.. _library_cusum_anomaly_detector:

``cusum_anomaly_detector``
==========================

CUSUM (Cumulative Sum Control Chart) anomaly detector for continuous
sequence-like datasets. This is a statistical anomaly-detection method
based on a two-sided CUSUM control chart. Declared continuous attributes
are interpreted as ordered monitoring steps.

The library implements the ``anomaly_detector_protocol`` defined in the
``anomaly_detection_protocols`` library. It learns a detector from a
continuous dataset, computes anomaly scores for new instances, predicts
``normal`` or ``anomaly``, and exports learned detectors as clauses or
files.

Datasets are represented as objects implementing the
``anomaly_dataset_protocol`` protocol from the
``anomaly_detection_protocols`` library. Declared continuous attributes
are interpreted as ordered monitoring steps in a sequence. See the
``cusum_anomaly_detector/tests.lgt`` file for example datasets.

API documentation
-----------------

Open the
`../../apis/library_index.html#cusum_anomaly_detector <../../apis/library_index.html#cusum_anomaly_detector>`__
link in a web browser.

Loading
-------

To load this library, load the ``loader.lgt`` file:

::

   | ?- logtalk_load(cusum_anomaly_detector(loader)).

Testing
-------

To test this library predicates, load the ``tester.lgt`` file:

::

   | ?- logtalk_load(cusum_anomaly_detector(tester)).

Features
--------

- **Statistical method**: implements anomaly detection based on a
  two-sided CUSUM control chart, using learned per-step population means
  and standard deviations for continuous attributes.

- **Ordered sequence interpretation**: declared continuous attributes
  are treated as ordered monitoring steps. For each known step value
  ``x_t``, the library computes ``z_t = (x_t - mu_t) / sigma_t`` and
  updates the positive and negative CUSUM recurrences along that
  attribute order. The learned detector stores a precomputed attribute
  schema so that this ordering does not need to be rebuilt for every
  scoring call.

- **CUSUM recurrences**: the positive and negative cumulative sums are
  updated as ``C+_t = max(0, C+_(t-1) + z_t - k)`` and
  ``C-_t = max(0, C-_(t-1) - z_t - k)``, where ``k`` is the learn-time
  allowance. The raw anomaly score is the maximum excursion over all
  positive and negative cumulative sums.

- **Continuous features only**: accepts datasets whose declared
  attributes are all ``continuous``.

- **Baseline training selection**: supports learn-time
  ``baseline_class_values(ClassValues)`` and
  ``baseline_selection_policy(Policy)`` options. The default baseline
  class values are ``[normal]``. The default ``reject`` policy throws an
  error if any non-baseline training example is found. The ``filter``
  policy removes non-baseline examples before fitting the baseline
  statistics.

- **Missing-value tolerant**: ignores missing values when fitting
  per-step statistics and skips them during scoring. Queries must still
  provide at least one known value.

- **Bounded scoring**: maps the raw CUSUM excursion to ``[0.0, 1.0)``
  using ``Score = Raw / (1 + Raw)``.

- **CUSUM control parameters**: supports learn-time ``allowance/1`` and
  ``decision_interval/1`` options. The default ``allowance(0.5)`` and
  ``decision_interval(5.0)`` correspond to a common standardized CUSUM
  setup.

- **Default threshold**: the default
  ``anomaly_threshold(0.8333333333333334)`` corresponds to the default
  raw decision interval ``5.0``. If a custom ``decision_interval/1`` is
  passed to ``learn/3`` without an explicit ``anomaly_threshold/1``, the
  stored anomaly threshold is derived automatically as ``H / (1 + H)``.

- **Learn-time control parameters**: ``allowance/1`` and
  ``decision_interval/1`` are recorded in the learned detector and
  reused for subsequent scoring and prediction. Passing them to
  ``predict/4`` does not override the learned values. Only
  ``anomaly_threshold/1`` can be overridden at predict time.

- **All-missing queries rejected**: scoring and prediction throw a
  ``domain_error(non_empty_known_values, AttributeNames)`` exception
  when every declared step is missing in the query.

- **Featureless datasets rejected**: datasets must declare at least one
  continuous feature; otherwise ``learn/2-3`` throws a
  ``domain_error(non_empty_features, Dataset)`` exception.

- **Detector export**: learned detectors can be exported as predicate
  clauses.

- **Explicit validation and diagnostics**: supports the shared
  ``check_anomaly_detector/1``, ``valid_anomaly_detector/1``,
  ``diagnostics/2``, ``diagnostic/2``, and
  ``anomaly_detector_options/2`` predicates.

Options
-------

The following options are supported by the public API:

- ``anomaly_threshold(Threshold)``: Threshold for ``predict/3-4``
  (default: ``0.8333333333333334``)
- ``allowance(Allowance)``: Learn-time CUSUM allowance ``k`` (default:
  ``0.5``)
- ``baseline_class_values(ClassValues)``: Learn-time class labels that
  are admissible for baseline fitting (default: ``[normal]``)
- ``baseline_selection_policy(Policy)``: Learn-time handling of examples
  whose class is not listed in ``baseline_class_values/1``. Supported
  values are ``reject`` and ``filter`` (default: ``reject``)
- ``decision_interval(DecisionInterval)``: Learn-time raw decision
  interval ``H`` (default: ``5.0``). If no explicit
  ``anomaly_threshold/1`` is passed to ``learn/3``, the stored threshold
  is derived from this value as ``H / (1 + H)``.

Detector representation
-----------------------

The learned detector is represented by default as:

::

   cusum_detector(TrainingDataset, AttributeSchema, Encoders, Diagnostics)

Where:

- ``TrainingDataset``: training dataset object identifier
- ``AttributeSchema``: precomputed attribute ordering metadata used to
  validate and reorder query step values efficiently during scoring
- ``Encoders``: list of ``cusum_encoder(Attribute, Mean, Scale)``
  records
- ``Diagnostics``: learned metadata terms including ``model/1``,
  ``training_dataset/1``, ``attribute_names/1``, ``feature_count/1``,
  ``example_count/1``, and ``options/1``. The ``example_count/1`` value
  is the effective number of training examples after applying the
  selected baseline selection policy.

When exported using ``export_to_clauses/4`` or ``export_to_file/4``,
this detector term is serialized directly as the single argument of the
generated predicate clause so that the exported model can be loaded and
reused as-is.

Notes
-----

Scoring has three stages. First, the detector computes one standardized
deviation ``z_t = (x_t - mu_t) / sigma_t`` for each known monitoring
step. Second, those deviations are processed sequentially using the
positive and negative CUSUM recurrences with the learned ``allowance/1``
value. Third, the maximum raw excursion is mapped to the interval
``[0.0, 1.0)`` using ``Score = Raw / (1 + Raw)``.

The ``allowance/1`` option changes the CUSUM update rule itself by
controlling how much drift must accumulate before the chart grows.
Larger values make the detector less sensitive to small shifts. The
``decision_interval/1`` option does not change scoring; it only affects
the default threshold stored when learning a detector.

The ``baseline_class_values/1`` option declares which dataset class
labels are admissible for baseline fitting. The
``baseline_selection_policy/1`` option then controls what happens when
other labels are present in the training data. The default ``reject``
policy raises a ``domain_error(baseline_only_training_data, Dataset)``
exception when any non-baseline example is found. The ``filter`` policy
removes non-baseline examples before fitting and raises a
``domain_error(non_empty_baseline_training_data, Dataset)`` exception if
no training examples remain after filtering.

Attributes with zero observed dispersion are assigned a fallback scale
of ``1.0``. This keeps the detector well-defined for singleton datasets
or constant steps while still yielding zero score for matching values
and positive scores for deviating values.
