.. _library_optics_clusterer:

``optics_clusterer``
====================

OPTICS clusterer. It uses deterministic OPTICS ordering with
epsilon-based cluster extraction for the fixed clusterer protocol.
Supports continuous attributes only.

The library implements the ``clusterer_protocol`` defined in the
``clustering_protocols`` library. It provides predicates for learning a
clusterer from a dataset, assigning new instances to clusters, and
exporting the learned clusterer as a list of predicate clauses or to a
file.

Datasets are represented as objects implementing the
``clustering_dataset_protocol`` protocol from the
``clustering_protocols`` library.

API documentation
-----------------

Open the
`../../apis/library_index.html#optics_clusterer <../../apis/library_index.html#optics_clusterer>`__
link in a web browser.

Loading
-------

To load this library, load the ``loader.lgt`` file:

::

   | ?- logtalk_load(optics_clusterer(loader)).

Testing
-------

To test this library predicates, load the ``tester.lgt`` file:

::

   | ?- logtalk_load(optics_clusterer(tester)).

To run the performance benchmark suite, load the
``tester_performance.lgt`` file:

::

   | ?- logtalk_load(optics_clusterer(tester_performance)).

Features
--------

- **OPTICS Ordering**: Learns a deterministic ordering using
  density-based reachability over continuous datasets.
- **Adaptive Neighborhood Indexing**: Uses a low-dimensional
  epsilon-grid index when it is likely to be cheaper and otherwise falls
  back to a deterministic metric tree for neighborhood search during
  ordering construction. The search backend can also be selected
  explicitly.
- **Continuous Datasets**: Accepts datasets containing only continuous
  attributes.
- **Distance Metrics**: Supports Euclidean and Manhattan distances.
- **Optional Feature Scaling**: Continuous attributes can be
  standardized using z-score scaling.
- **Epsilon-Based Extraction**: Extracts clusters from the ordering
  using a configurable extraction epsilon threshold.
- **Noise Detection**: New instances not reachable from an extracted
  core cluster within the extraction threshold are assigned to
  ``noise``.
- **Prediction Pruning**: Classification reuses per-cluster core-point
  bounds to prune clusters that cannot beat the current best reachable
  match.
- **Portable Export**: Learned clusterers can be exported as clauses or
  files and reused later.

Options
-------

The following options can be passed to the ``learn/3`` predicate:

- ``ordering_and_extraction_epsilons(MaximumOrderingEpsilon, ExtractionEpsilon)``:
  Pair of epsilon thresholds where ``MaximumOrderingEpsilon`` is the
  neighborhood radius used while constructing the OPTICS ordering and
  ``ExtractionEpsilon`` is the threshold used when extracting clusters
  from the learned ordering and when classifying new instances. Default
  is ``ordering_and_extraction_epsilons(1.0, 1.0)``.
  ``ExtractionEpsilon`` must not be greater than
  ``MaximumOrderingEpsilon``.
- ``search_index(SearchIndex)``: Search backend selection used while
  constructing the OPTICS ordering. Options are ``auto`` (default),
  ``grid``, and ``metric_tree``.
- ``minimum_points(MinimumPoints)``: Minimum neighborhood size required
  for a point to be considered a core point. Default is ``2``.
- ``distance_metric(Metric)``: Distance metric to use. Options:
  ``euclidean`` (default) or ``manhattan``.
- ``feature_scaling(FeatureScaling)``: Whether to standardize continuous
  attributes before clustering. Options: ``on`` (default) or ``off``.

Clusterer representation
------------------------

The learned clusterer is represented as a compound term with the functor
chosen by the user when exporting the clusterer and arity 5. For
example:

::

   optics_clusterer(Encoders, Ordering, Clusters, Noise, Options)

Where:

- ``Encoders``: List of continuous attribute encoders storing attribute
  name, mean, and scale.
- ``Ordering``: List of ordered points annotated with reachability and
  core-distance information.
- ``Clusters``: List of extracted clusters in cluster-id order.
- ``Noise``: List of extracted noise points.
- ``Options``: Effective training options used to learn the clusterer.
