.. _library_ewma_anomaly_detector:

``ewma_anomaly_detector``
=========================

EWMA (Exponentially Weighted Moving Average) anomaly detector for
continuous sequence-like datasets. It is a statistical anomaly-detection
method based on a two-sided EWMA control chart: for each known step
value ``x_t``, it computes a standardized deviation, updates the EWMA
statistic ``E_t``, and uses the maximum normalized excursion
``|E_t| / (L*c_t)`` as the raw anomaly score, so a score of ``1.0``
corresponds exactly to reaching the chosen EWMA control limit.

The library implements the ``anomaly_detector_protocol`` defined in the
``anomaly_detection_protocols`` library. It learns a detector from a
continuous dataset, computes anomaly scores for new instances, predicts
``normal`` or ``anomaly``, and exports learned detectors as clauses or
files.

Datasets are represented as objects implementing the
``anomaly_dataset_protocol`` protocol from the
``anomaly_detection_protocols`` library. Declared continuous attributes
are interpreted as ordered monitoring steps in a sequence. See the
``ewma_anomaly_detector/tests.lgt`` file for example datasets.

API documentation
-----------------

Open the
`../../apis/library_index.html#ewma_anomaly_detector <../../apis/library_index.html#ewma_anomaly_detector>`__
link in a web browser.

Loading
-------

To load this library, load the ``loader.lgt`` file:

::

   | ?- logtalk_load(ewma_anomaly_detector(loader)).

Testing
-------

To test this library predicates, load the ``tester.lgt`` file:

::

   | ?- logtalk_load(ewma_anomaly_detector(tester)).

Features
--------

- **Statistical method**: implements anomaly detection based on a
  two-sided EWMA control chart, using learned per-step population means
  and standard deviations for continuous attributes.

- **Ordered sequence interpretation**: declared continuous attributes
  are treated as ordered monitoring steps. For each known step value
  ``x_t``, the library computes ``z_t = (x_t - mu_t) / sigma_t`` and
  updates the EWMA recurrence
  ``E_t = lambda*z_t + (1 - lambda)*E_(t-1)`` with ``E_0 = 0``.

- **EWMA control limits**: the raw anomaly score is the maximum
  normalized excursion ``|E_t| / (L*c_t)``, where ``L`` is the
  learn-time ``control_limit_multiplier/1`` option and
  ``c_t = sqrt(lambda/(2-lambda) * (1 - (1-lambda)^(2*t)))`` is the
  classical EWMA control-limit factor after ``t`` EWMA updates. The
  learned detector stores a precomputed attribute schema so that query
  values can be reordered efficiently during scoring.

- **Continuous features only**: accepts datasets whose declared
  attributes are all ``continuous``.

- **Baseline training selection**: supports learn-time
  ``baseline_class_values(ClassValues)`` and
  ``baseline_selection_policy(Policy)`` options. The default baseline
  class values are ``[normal]``. The default ``reject`` policy throws an
  error if any non-baseline training example is found. The ``filter``
  policy removes non-baseline examples before fitting the baseline
  statistics.

- **Missing-value tolerant**: ignores missing values when fitting
  per-step statistics. During scoring, missing step values do not update
  the EWMA state or advance the EWMA update count. Queries must still
  provide at least one known value.

- **Bounded scoring**: maps the raw EWMA excursion to ``[0.0, 1.0)``
  using ``Score = Raw / (1 + Raw)``.

- **EWMA control parameters**: supports learn-time
  ``control_limit_multiplier/1`` and ``smoothing_factor/1`` options. The
  default ``control_limit_multiplier(3.0)`` and
  ``smoothing_factor(0.2)`` correspond to a common EWMA monitoring
  setup.

- **Default threshold**: the default ``anomaly_threshold(0.5)``
  corresponds to the chosen EWMA control limit after score
  normalization. Because the raw score is already normalized by the
  learned control limit, this threshold remains the same for any learned
  ``control_limit_multiplier/1`` value.

- **Learn-time control parameters**: ``control_limit_multiplier/1`` and
  ``smoothing_factor/1`` are recorded in the learned detector and reused
  for subsequent scoring and prediction. Passing them to ``predict/4``
  does not override the learned values. Only ``anomaly_threshold/1`` can
  be overridden at predict time.

- **All-missing queries rejected**: scoring and prediction throw a
  ``domain_error(non_empty_known_values, AttributeNames)`` exception
  when every declared step is missing in the query.

- **Featureless datasets rejected**: datasets must declare at least one
  continuous feature; otherwise ``learn/2-3`` throws a
  ``domain_error(non_empty_features, Dataset)`` exception.

- **Detector export**: learned detectors can be exported as predicate
  clauses.

- **Explicit validation and diagnostics**: supports the shared
  ``check_anomaly_detector/1``, ``valid_anomaly_detector/1``,
  ``diagnostics/2``, ``diagnostic/2``, and
  ``anomaly_detector_options/2`` predicates.

Options
-------

The following options are supported by the public API:

- ``anomaly_threshold(Threshold)``: Threshold for ``predict/3-4``
  (default: ``0.5``)
- ``baseline_class_values(ClassValues)``: Learn-time class labels that
  are admissible for baseline fitting (default: ``[normal]``)
- ``baseline_selection_policy(Policy)``: Learn-time handling of examples
  whose class is not listed in ``baseline_class_values/1``. Supported
  values are ``reject`` and ``filter`` (default: ``reject``)
- ``control_limit_multiplier(ControlLimitMultiplier)``: Learn-time EWMA
  control-limit multiplier ``L`` (default: ``3.0``)
- ``smoothing_factor(SmoothingFactor)``: Learn-time EWMA smoothing
  factor ``lambda`` (default: ``0.2``)

Detector representation
-----------------------

The learned detector is represented by default as:

::

   ewma_detector(TrainingDataset, AttributeSchema, Encoders, Diagnostics)

Where:

- ``TrainingDataset``: training dataset object identifier
- ``AttributeSchema``: precomputed attribute ordering metadata used to
  validate and reorder query step values efficiently during scoring
- ``Encoders``: list of ``ewma_encoder(Attribute, Mean, Scale)`` records
- ``Diagnostics``: learned metadata terms including ``model/1``,
  ``training_dataset/1``, ``attribute_names/1``, ``feature_count/1``,
  ``example_count/1``, and ``options/1``. The ``example_count/1`` value
  is the effective number of training examples after applying the
  selected baseline selection policy.

When exported using ``export_to_clauses/4`` or ``export_to_file/4``,
this detector term is serialized directly as the single argument of the
generated predicate clause so that the exported model can be loaded and
reused as-is.

Notes
-----

Scoring has three stages. First, the detector computes one standardized
deviation ``z_t = (x_t - mu_t) / sigma_t`` for each known monitoring
step. Second, those deviations are processed sequentially using the EWMA
recurrence ``E_t = lambda*z_t + (1 - lambda)*E_(t-1)`` with the learned
``smoothing_factor/1`` value. Third, the maximum normalized excursion
``|E_t| / (L*c_t)`` is mapped to the interval ``[0.0, 1.0)`` using
``Score = Raw / (1 + Raw)``.

The control-limit factor ``c_t`` is computed from the number of actual
EWMA updates, not from the declared attribute position. Accordingly,
leading or intermediate missing step values neither update the EWMA
state nor widen the control limits.

The ``smoothing_factor/1`` option changes the EWMA update rule itself.
Smaller values make the detector retain longer-term history while larger
values react more strongly to recent deviations. The
``control_limit_multiplier/1`` option scales the control limits directly
in the score path. Larger values therefore make the detector less
sensitive by requiring larger EWMA excursions before the raw score
reaches ``1.0``.

The ``baseline_class_values/1`` option declares which dataset class
labels are admissible for baseline fitting. The
``baseline_selection_policy/1`` option then controls what happens when
other labels are present in the training data. The default ``reject``
policy raises a ``domain_error(baseline_only_training_data, Dataset)``
exception when any non-baseline example is found. The ``filter`` policy
removes non-baseline examples before fitting and raises a
``domain_error(non_empty_baseline_training_data, Dataset)`` exception if
no training examples remain after filtering.

Attributes with zero observed dispersion are assigned a fallback scale
of ``1.0``. This keeps the detector well-defined for singleton datasets
or constant steps while still yielding zero score for matching values
and positive scores for deviating values.
