sequential_pattern_mining_protocols
This library provides support entities for sequential pattern mining
algorithms. Ordered sequence datasets are represented as objects
implementing the sequence_dataset_protocol protocol. The generic
pattern_miner_protocol protocol and the pattern_miner_common
category used by concrete miners are loaded from the
pattern_mining_protocols core library.
The sequential_pattern_mining_common category builds on that generic
core with sequential-specific helpers for dataset validation, support
count accumulation, and sequential pattern ordering/filtering.
This library also provides reusable sequence smoke-test datasets and a small smoke-test suite.
API documentation
Open the ../../apis/library_index.html#sequential_pattern_mining_protocols link in a web browser.
Loading
To load all entities in this library, load the loader.lgt file:
| ?- logtalk_load(sequential_pattern_mining_protocols(loader)).
Testing
To run the library smoke tests, load the tester.lgt file:
| ?- logtalk_load(sequential_pattern_mining_protocols(tester)).
Test datasets
The test_datasets directory includes the following sample sequence
datasets:
clickstream_sequences.lgt: A compact sequence dataset with repeated prefixes intended for sequential-pattern smoke tests.prefix_ladder_sequences.lgt: A small ladder-shaped dataset of singleton events intended for exact baseline checks across sequential mining algorithms.same_event_vs_next_event_sequences.lgt: A compact dataset intended to distinguish same-event extensions from next-event extensions.repeated_embedding_sequences.lgt: A dataset where the same subsequence admits multiple embeddings inside a single sequence, intended for support-count semantics checks.border_threshold_sequences.lgt: A compact dataset with patterns just above and below typical support thresholds, intended for pruning and threshold regression tests.closure_sequences.lgt: A compact dataset intended for closed-pattern tests where some frequent patterns share the same support as one of their supersequences.dense_overlap_sequences.lgt: A denser dataset with overlapping subsequences and mixed singleton and multi-item events, intended for overlap-heavy mining scenarios.branching_sequences.lgt: A dataset with a common prefix and several competing branches, intended for candidate-generation and branching coverage.
The directory also includes invalid fixtures useful for validation and error-handling tests:
invalid_undeclared_item_sequences.lgt: Uses an item not listed in the declared item domain.invalid_unsorted_itemset_sequences.lgt: Uses an event with items not in canonical sorted order.invalid_duplicate_item_in_event_sequences.lgt: Uses an event with a duplicate item.invalid_empty_event_sequences.lgt: Uses an empty event inside a sequence.