kernel_pca_projection
Kernel Principal Component Analysis reducer for continuous datasets. The
library implements the dimension_reducer_protocol defined in the
dimension_reduction_protocols library and learns a nonlinear
projection by centering the training data, optionally standardizing
continuous attributes, building a centered kernel Gram matrix, and
extracting deterministic principal directions in sample space using
portable power iteration with deflation.
API documentation
Open the ../../apis/library_index.html#kernel_pca_projection link in a web browser.
Loading
To load this library, load the loader.lgt file:
| ?- logtalk_load(kernel_pca_projection(loader)).
Testing
To test this library predicates, load the tester.lgt file:
| ?- logtalk_load(kernel_pca_projection(tester)).
Features
Continuous Datasets: Accepts datasets containing only continuous attributes. Missing or nonnumeric values are rejected.
Centering and Optional Scaling: Centers all attributes and optionally standardizes them before evaluating kernels.
Configurable Shortfall Handling: Lets callers choose whether a component-extraction shortfall raises an error or truncates the learned reducer with explicit diagnostics.
Supported Kernels: Supports linear, polynomial with non-negative offset, and radial basis function kernels through the
kernel/1option.Shared Gram-Centering Helpers: Delegates training Gram-matrix centering and out-of-sample Gram-vector centering to the shared
linear_algebralibrary.Portable Eigensolver: Uses deterministic power iteration with deflation instead of backend-specific linear algebra libraries.
Projection API: Transforms a new instance into a list of
component_N-Valuepairs using centered kernel evaluations against the training rows.Model Export: Learned reducers can be exported as predicate clauses or written to a file.
Options
The learn/3 predicate accepts the following options:
n_components/1: Number of kernel principal components to extract. Requests that exceedSampleCount - 1raisedomain_error(component_count, Requested-Maximum). The default is2.feature_scaling/1: Whether to standardize continuous attributes before evaluating kernels. Options:true(default) orfalse.shortfall_policy/1: Controls what happens when the centered kernel Gram matrix yields fewer numerically significant components than requested. Options:truncate(default), which returns a reducer with fewer components and records ashortfall(truncated(Requested, Learned, ResidualEigenvalue, Tolerance))diagnostic, orerror, which raisesdomain_error(component_count, Requested-Learned).kernel/1: Kernel specification. Supported values arelinear(default),polynomial(Degree, Gamma, Coef0)with positiveDegree, positiveGamma, and non-negativeCoef0, andrbf(Gamma)with positiveGamma.maximum_iterations/1: Maximum number of power-iteration steps used when estimating each dual principal direction. The default is1000.tolerance/1: Positive convergence tolerance used both for power-iteration stopping and for deciding when deflated eigenvalues are negligible. The default is1.0e-8.
Usage
The following examples use the sample datasets shipped with the
dimension_reduction_protocols library:
| ?- logtalk_load(dimension_reduction_protocols('test_datasets/correlated_plane')).
Learning a reducer
| ?- kernel_pca_projection::learn(correlated_plane, DimensionReducer).
| ?- kernel_pca_projection::learn(correlated_plane, DimensionReducer, [n_components(1), kernel(rbf(0.25))]).
Transforming new instances
| ?- kernel_pca_projection::learn(correlated_plane, DimensionReducer, [n_components(2), kernel(rbf(0.25))]),
kernel_pca_projection::transform(DimensionReducer, [x-2.0, y-4.0, z-6.0], ReducedInstance).
Exporting and reusing the reducer
| ?- kernel_pca_projection::learn(correlated_plane, DimensionReducer, [n_components(1)]),
kernel_pca_projection::export_to_file(correlated_plane, DimensionReducer, reducer, 'kernel_pca_reducer.pl').
| ?- logtalk_load('kernel_pca_reducer.pl'),
reducer(Reducer),
kernel_pca_projection::transform(Reducer, [x-1.0, y-2.0, z-3.0], ReducedInstance).
Dimension reducer representation
The learned dimension reducer is represented by a compound term with the functor chosen by the implementation and arity 7. For example:
kernel_pca_reducer(Encoders, TrainingRows, RowMeans, TotalMean, Components, ExplainedVariances, Diagnostics)
Where:
Encoders: List of continuous attribute encoders storing attribute name, mean, and scale.TrainingRows: Encoded training rows used when evaluating kernels for new instances.RowMeans: Per-training-row kernel means used for centering out-of-sample kernel vectors.TotalMean: Global kernel mean used for centering both the training Gram matrix and new kernel vectors.Components: List of normalized dual projection vectors in descending variance order.ExplainedVariances: List of kernel Gram matrix eigenvalues matching the extracted components.Diagnostics: Learned metadata including the effective training options, kernel preprocessing, sample count, explained variances, and optional truncate-mode shortfall details.
When exported using export_to_clauses/4 or export_to_file/4,
this reducer term is serialized directly as the single argument of the
generated predicate clause so that the exported model can be loaded and
reused as-is.
References
Schölkopf, B., Smola, A., and Müller, K.-R. (1998) - “Nonlinear component analysis as a kernel eigenvalue problem”.
Shawe-Taylor, J. and Cristianini, N. (2004) - “Kernel Methods for Pattern Analysis”.