pca_projection
Principal Component Analysis reducer for continuous datasets. The
library implements the dimension_reducer_protocol defined in the
dimension_reduction_protocols library and learns a linear projection
by centering the training data, optionally standardizing continuous
attributes, computing the covariance matrix, and extracting principal
components using deterministic power iteration with deflation.
API documentation
Open the ../../apis/library_index.html#pca_projection link in a web browser.
Loading
To load this library, load the loader.lgt file:
| ?- logtalk_load(pca_projection(loader)).
Testing
To test this library predicates, load the tester.lgt file:
| ?- logtalk_load(pca_projection(tester)).
Features
Continuous Datasets: Accepts datasets containing only continuous attributes.
Centering and Optional Scaling: Centers all attributes and optionally standardizes them before extracting principal directions.
Portable Eigensolver: Uses deterministic power iteration with deflation instead of backend-specific linear algebra libraries.
Projection API: Transforms a new instance into a list of
component_N-Valuepairs.Model Export: Learned reducers can be exported as predicate clauses or written to a file.
Missing Values: Missing or nonnumeric values are rejected.
Options
The learn/3 predicate accepts the following options:
n_components/1: Number of principal components to extract. Requests that exceed the number of features raisedomain_error(component_count, Requested-Maximum). The default is2.feature_scaling/1: Whether to standardize continuous attributes before extracting components. Options:true(default) orfalse.maximum_iterations/1: Maximum number of power-iteration steps used when estimating each principal direction. The default is1000.tolerance/1: Positive convergence tolerance used both for power-iteration stopping and for deciding when deflated eigenvalues are negligible. The default is1.0e-8.
Usage
The following examples use the sample datasets shipped with the
dimension_reduction_protocols library:
| ?- logtalk_load(dimension_reduction_protocols('test_datasets/correlated_plane')),
logtalk_load(dimension_reduction_protocols('test_datasets/high_dimensional_measurements')).
Learning a reducer
| ?- pca_projection::learn(correlated_plane, DimensionReducer).
| ?- pca_projection::learn(correlated_plane, DimensionReducer, [n_components(1), feature_scaling(false), maximum_iterations(200), tolerance(1.0e-7)]).
Transforming new instances
| ?- pca_projection::learn(high_dimensional_measurements, DimensionReducer),
pca_projection::transform(DimensionReducer, [f1-0.9, f2-1.1, f3-1.0, f4-2.0, f5-2.2, f6-2.1], ReducedInstance).
| ?- pca_projection::learn(correlated_plane, DimensionReducer, [n_components(1)]),
pca_projection::transform(DimensionReducer, [x-1.0, y-2.0, z-3.0], ReducedInstance).
Exporting and reusing the reducer
| ?- pca_projection::learn(correlated_plane, DimensionReducer, [n_components(1)]),
pca_projection::export_to_file(correlated_plane, DimensionReducer, reducer, 'pca_reducer.pl').
| ?- logtalk_load('pca_reducer.pl'),
reducer(Reducer),
pca_projection::transform(Reducer, [x-1.0, y-2.0, z-3.0], ReducedInstance).
Dimension reducer representation
The learned dimension reducer is represented by a compound term with the functor chosen by the implementation and arity 4. For example:
pca_reducer(Encoders, Components, ExplainedVariances, Diagnostics)
Where:
Encoders: List of continuous attribute encoders storing attribute name, mean, and scale.Components: List of principal direction vectors in descending variance order.ExplainedVariances: List of eigenvalues matching the extracted components.Diagnostics: Learned reducer metadata including the effective training options and learned model details.
When exported using export_to_clauses/4 or export_to_file/4,
this reducer term is serialized directly as the single argument of the
generated predicate clause so that the exported model can be loaded and
reused as-is.
References
Pearson, K. (1901) - “On lines and planes of closest fit to systems of points in space”.
Hotelling, H. (1933) - “Analysis of a complex of statistical variables into principal components”.