kernel_pca_projection

Kernel Principal Component Analysis reducer for continuous datasets. The library implements the dimension_reducer_protocol defined in the dimension_reduction_protocols library and learns a nonlinear projection by centering the training data, optionally standardizing continuous attributes, building a centered kernel Gram matrix, and extracting deterministic principal directions in sample space using portable power iteration with deflation.

API documentation

Open the ../../apis/library_index.html#kernel_pca_projection link in a web browser.

Loading

To load this library, load the loader.lgt file:

| ?- logtalk_load(kernel_pca_projection(loader)).

Testing

To test this library predicates, load the tester.lgt file:

| ?- logtalk_load(kernel_pca_projection(tester)).

Features

  • Continuous Datasets: Accepts datasets containing only continuous attributes. Missing or nonnumeric values are rejected.

  • Centering and Optional Scaling: Centers all attributes and optionally standardizes them before evaluating kernels.

  • Configurable Shortfall Handling: Lets callers choose whether a component-extraction shortfall raises an error or truncates the learned reducer with explicit diagnostics.

  • Supported Kernels: Supports linear, polynomial with non-negative offset, and radial basis function kernels through the kernel/1 option.

  • Shared Gram-Centering Helpers: Delegates training Gram-matrix centering and out-of-sample Gram-vector centering to the shared linear_algebra library.

  • Portable Eigensolver: Uses deterministic power iteration with deflation instead of backend-specific linear algebra libraries.

  • Projection API: Transforms a new instance into a list of component_N-Value pairs using centered kernel evaluations against the training rows.

  • Model Export: Learned reducers can be exported as predicate clauses or written to a file.

Options

The learn/3 predicate accepts the following options:

  • n_components/1: Number of kernel principal components to extract. Requests that exceed SampleCount - 1 raise domain_error(component_count, Requested-Maximum). The default is 2.

  • feature_scaling/1: Whether to standardize continuous attributes before evaluating kernels. Options: true (default) or false.

  • shortfall_policy/1: Controls what happens when the centered kernel Gram matrix yields fewer numerically significant components than requested. Options: truncate (default), which returns a reducer with fewer components and records a shortfall(truncated(Requested, Learned, ResidualEigenvalue, Tolerance)) diagnostic, or error, which raises domain_error(component_count, Requested-Learned).

  • kernel/1: Kernel specification. Supported values are linear (default), polynomial(Degree, Gamma, Coef0) with positive Degree, positive Gamma, and non-negative Coef0, and rbf(Gamma) with positive Gamma.

  • maximum_iterations/1: Maximum number of power-iteration steps used when estimating each dual principal direction. The default is 1000.

  • tolerance/1: Positive convergence tolerance used both for power-iteration stopping and for deciding when deflated eigenvalues are negligible. The default is 1.0e-8.

Usage

The following examples use the sample datasets shipped with the dimension_reduction_protocols library:

| ?- logtalk_load(dimension_reduction_protocols('test_datasets/correlated_plane')).

Learning a reducer

| ?- kernel_pca_projection::learn(correlated_plane, DimensionReducer).

| ?- kernel_pca_projection::learn(correlated_plane, DimensionReducer, [n_components(1), kernel(rbf(0.25))]).

Transforming new instances

| ?- kernel_pca_projection::learn(correlated_plane, DimensionReducer, [n_components(2), kernel(rbf(0.25))]),
     kernel_pca_projection::transform(DimensionReducer, [x-2.0, y-4.0, z-6.0], ReducedInstance).

Exporting and reusing the reducer

| ?- kernel_pca_projection::learn(correlated_plane, DimensionReducer, [n_components(1)]),
     kernel_pca_projection::export_to_file(correlated_plane, DimensionReducer, reducer, 'kernel_pca_reducer.pl').

| ?- logtalk_load('kernel_pca_reducer.pl'),
     reducer(Reducer),
     kernel_pca_projection::transform(Reducer, [x-1.0, y-2.0, z-3.0], ReducedInstance).

Dimension reducer representation

The learned dimension reducer is represented by a compound term with the functor chosen by the implementation and arity 7. For example:

kernel_pca_reducer(Encoders, TrainingRows, RowMeans, TotalMean, Components, ExplainedVariances, Diagnostics)

Where:

  • Encoders: List of continuous attribute encoders storing attribute name, mean, and scale.

  • TrainingRows: Encoded training rows used when evaluating kernels for new instances.

  • RowMeans: Per-training-row kernel means used for centering out-of-sample kernel vectors.

  • TotalMean: Global kernel mean used for centering both the training Gram matrix and new kernel vectors.

  • Components: List of normalized dual projection vectors in descending variance order.

  • ExplainedVariances: List of kernel Gram matrix eigenvalues matching the extracted components.

  • Diagnostics: Learned metadata including the effective training options, kernel preprocessing, sample count, explained variances, and optional truncate-mode shortfall details.

When exported using export_to_clauses/4 or export_to_file/4, this reducer term is serialized directly as the single argument of the generated predicate clause so that the exported model can be loaded and reused as-is.

References

  1. Schölkopf, B., Smola, A., and Müller, K.-R. (1998) - “Nonlinear component analysis as a kernel eigenvalue problem”.

  2. Shawe-Taylor, J. and Cristianini, N. (2004) - “Kernel Methods for Pattern Analysis”.