probabilistic_pca_projection
Probabilistic Principal Component Analysis reducer for continuous
datasets. The library implements the dimension_reducer_protocol
defined in the dimension_reduction_protocols library and learns a
linear latent-variable projection by centering the training data,
optionally standardizing continuous attributes, estimating the sample
covariance matrix, extracting deterministic leading eigenvectors using
portable power iteration with deflation, and converting them into the
closed-form maximum-likelihood PPCA loading matrix and posterior latent
projection.
API documentation
Open the ../../apis/library_index.html#probabilistic_pca_projection link in a web browser.
Loading
To load this library, load the loader.lgt file:
| ?- logtalk_load(probabilistic_pca_projection(loader)).
Testing
To test this library predicates, load the tester.lgt file:
| ?- logtalk_load(probabilistic_pca_projection(tester)).
Features
Continuous Datasets: Accepts datasets containing only continuous attributes. Missing or nonnumeric values are rejected.
Centering and Optional Scaling: Centers all attributes and optionally standardizes them before fitting the covariance model.
Probabilistic Latent Model: Estimates the PPCA loading matrix and isotropic observation noise variance from the learned covariance eigensystem.
Configurable Shortfall Handling: Lets callers choose whether a numerical-rank shortfall raises an error or returns a truncated reducer with explicit diagnostics.
Projection API: Transforms a new instance into posterior latent means returned as
component_N-Valuepairs.Model Export: Learned reducers can be exported as predicate clauses or written to a file.
Options
The learn/3 predicate accepts the following options:
n_components/1: Number of latent dimensions to extract. Requests that exceed the structural PPCA limitmin(FeatureCount, SampleCount - 1)raisedomain_error(component_count, Requested-Maximum). The default is2.feature_scaling/1: Whether to standardize continuous attributes before fitting the covariance model. Options:true(default) orfalse.shortfall_policy/1: Controls what happens when the covariance matrix yields fewer numerically significant components than requested after passing the structural PPCA bound above. Options:truncate(default), which returns a reducer with fewer components and records ashortfall(truncated(Requested, Learned, ResidualEigenvalue, Tolerance))diagnostic, orerror, which raisesdomain_error(component_count, Requested-Learned).maximum_iterations/1: Maximum number of power-iteration steps used when estimating each covariance eigenvector. The default is1000.tolerance/1: Positive convergence tolerance used both for power-iteration stopping and for deciding when residual eigenvalues are negligible. The default is1.0e-8.
Usage
The following examples use the sample datasets shipped with the
dimension_reduction_protocols library:
| ?- logtalk_load(dimension_reduction_protocols('test_datasets/correlated_plane')).
Learning a reducer
| ?- probabilistic_pca_projection::learn(correlated_plane, DimensionReducer).
| ?- probabilistic_pca_projection::learn(correlated_plane, DimensionReducer, [n_components(1), feature_scaling(false), shortfall_policy(error)]).
Transforming new instances
| ?- probabilistic_pca_projection::learn(correlated_plane, DimensionReducer, [n_components(2)]),
probabilistic_pca_projection::transform(DimensionReducer, [x-2.0, y-4.0, z-6.0], ReducedInstance).
Exporting and reusing the reducer
| ?- probabilistic_pca_projection::learn(correlated_plane, DimensionReducer, [n_components(1)]),
probabilistic_pca_projection::export_to_file(correlated_plane, DimensionReducer, reducer, 'probabilistic_pca_reducer.pl').
| ?- logtalk_load('probabilistic_pca_reducer.pl'),
reducer(Reducer),
probabilistic_pca_projection::transform(Reducer, [x-1.0, y-2.0, z-3.0], ReducedInstance).
Dimension reducer representation
The learned dimension reducer is represented by a compound term with the functor chosen by the implementation and arity 6. For example:
probabilistic_pca_reducer(Encoders, Components, Loadings, NoiseVariance, ExplainedVariances, Diagnostics)
Where:
Encoders: List of continuous attribute encoders storing attribute name, mean, and scale.Components: List of posterior latent projection vectors in descending explained-variance order.Loadings: List of maximum-likelihood PPCA loading vectors aligned with the extracted latent dimensions.NoiseVariance: Estimated isotropic observation noise variance.ExplainedVariances: List of retained covariance eigenvalues matching the extracted latent dimensions.Diagnostics: Learned metadata including the effective training options, sample count, retained explained variances, estimated noise variance, preprocessing details, and optional truncate-mode shortfall details.
When exported using export_to_clauses/4 or export_to_file/4,
this reducer term is serialized directly as the single argument of the
generated predicate clause so that the exported model can be loaded and
reused as-is.
References
Tipping, M. E. and Bishop, C. M. (1999) - “Probabilistic Principal Component Analysis”.
Bishop, C. M. (2006) - “Pattern Recognition and Machine Learning”. Section 12.2.