dimension_reduction_protocols

This library provides protocols and reusable support predicates used in the implementation of machine learning dimension reduction algorithms. Unsupervised datasets are represented as objects implementing the dimension_reduction_dataset_protocol protocol. Labeled datasets for supervised reducers such as Linear Discriminant Analysis are represented as objects implementing the supervised_dimension_reduction_dataset_protocol protocol. Target-valued datasets for reducers such as partial least squares projections are represented as objects implementing the target_supervised_dimension_reduction_dataset_protocol protocol. Dimension reducers are represented as objects importing the dimension_reducer_common category.

This library also provides reusable test datasets and a small smoke-test suite.

Concrete dimension reduction algorithms are intentionally out of scope for this package. The goal is to provide a portable foundation for libraries such as pca_projection, truncated_svd_projection, random_projection, nmf_projection, lda_projection, and pls_projection, including shared reducer helpers for common tasks such as feature encoding, projection, reducer validation, diagnostics, export, and pretty-printing.

Exported reducers are serialized protocol-wide as single-argument predicates such as reducer(Reducer), allowing the saved reducer term to be loaded and passed directly to transform/3.

All reducers importing the shared category are also expected to record a diagnostics list containing at least model/1 and options/1 terms. The shared protocol surface exposes this metadata using the diagnostics/2, diagnostic/2, and dimension_reducer_options/2 predicates and validates serialized reducers using check_dimension_reducer/1.

API documentation

Open the ../../apis/library_index.html#dimension_reduction_protocols link in a web browser.

Loading

To load all entities in this library, load the loader.lgt file:

| ?- logtalk_load(dimension_reduction_protocols(loader)).

Common options

The dimension_reducer_common category supports the feature_scaling/1 option used by importing reducers to control continuous feature normalization before projection:

  • feature_scaling(true) standardizes each continuous attribute using its training-set mean and standard deviation.

  • feature_scaling(false) only centers each continuous attribute using its training-set mean.

The current pca_projection, random_projection, lda_projection, and pls_projection libraries all define feature_scaling(true) as their default.

Reducers can also specialize preprocessing behavior. For example, the nmf_projection library only supports center(false) because the learned feature representation must remain non-negative.

Testing

To run the library smoke tests, load the tester.lgt file:

| ?- logtalk_load(dimension_reduction_protocols(tester)).

Test datasets

Several sample datasets are included in the test_datasets directory:

  • correlated_plane.lgt — A compact continuous dataset with 8 examples and 3 continuous attributes (x, y, z) where the features are strongly correlated. It is intended for PCA and projection smoke tests.

  • high_dimensional_measurements.lgt — A small continuous dataset with 10 examples and 6 continuous attributes (f1 through f6). It is intended for testing projected dimensionality and reducer output shape.

  • low_rank_rectangular.lgt — A compact continuous dataset with 4 examples and 3 continuous attributes whose third feature is the sum of the first two. It is intended for testing matrix-rank truncation and singular-value-based reducers such as truncated_svd_projection.

  • labeled_measurements.lgt — A compact labeled continuous dataset with 9 examples, 4 continuous attributes (length, width, height, weight), and 3 class labels (alpha, beta, gamma). It is intended for testing supervised reducers such as lda_projection.

  • parts_based_measurements.lgt — A compact non-negative continuous dataset with 5 examples and 4 continuous attributes (f1 through f4) built from two latent additive parts. It is intended for testing parts-based reducers such as nmf_projection.

  • target_latent_measurements.lgt — A compact target-valued continuous dataset with 6 examples, 4 continuous attributes (f1 through f4), and a numeric target attribute (score) influenced by two latent directions. It is intended for testing target-supervised reducers such as pls_projection.

  • Target-supervised smoke coverage is currently provided by the target_measurements object in the library smoke tests, which exercises the target_supervised_dimension_reduction_dataset_protocol contract.