Sequence Boosting Toolbox


The pboost toolbox is a set of command line programs and a Matlab wrapper for mining frequent subsequences and sequence classification. For our purposes, a sequence is defined an ordered sequence of sets of discrete numbers. (If all sets contain exactly one element, the sequence is a string.) This definition of sequence is flexible enough to model a number of interesting problems and has been used successfully for human action classification in video data.

The pboost classifier checks for the presence of certain subsequences in a sequence to be tested. The subsequences being checked are optimally determined by discriminative subsequence mining. The overall classification function is interpretable because only a small number of subsequences is used to determine the overall classification decision. Hence for some applications, subsequence mining can offer an alternative to implicitly represented feature spaces (eg. string and sequence kernels) which do not allow an interpretation of the resulting classifier.

On this page, we provide instructions and source code, as well as a number of example real world data sets in order to foster the discussion and adoption of the subsequence mining methodology. Please see the publications section for published papers.

Features and Demo

The pboost toolbox includes source codes for the following functionalities:

All of the code is written in C++ and makes use of libboost, the GETFEM GMM++ matrix library, the COIN-OR Open Solver Interface library and the COIN-OR Linear Programming Solver (CLP). For your convenience all these libraries are bundled in the download package below and allow for easy recompilation, although statically compiled binaries for Linux x86 and x86-64 are included.

The toolbox includes real world data sets for testing purposes. You can run the included demo.sh file in the dataset/kth-dataset/ directory.


Below you find the manpage documentation included in the distribution.
pspan PrefixSpan frequent subsequence mining pspan.txt pspan.pdf
pboost Subsequence Boosting pboost.txt pboost.pdf
ptest Classifier test program ptest.txt ptest.pdf


The pboost toolbox. The package includes the source code, pre-compiled binaries for the Linux/x86 and the Linux/x86-64 architectures. Also included are two data sets, one coming from human action classification in videos and the other is a toy data set of textual descriptions of country flags. See the included demo.sh file on how they are used.

Distribution: source code, precompiled binaries and demo file

License: The software is licensed under the GNU General Public License, version 2. A copy of the license document is included in the distribution.

Installation: the distribution includes statically compiled binaries for your convenience. For manual compilation and compilation of the Matlab wrappers, please adjust the variables in the Makefile.options file, especially the MATLABROOT variable. After editing the file accordingly, the program should compile on any recent Linux system. If you have the frequently encountered problem complaining about GCC_3.3 not being found when you run the mex functions, please refer to this discussion at the Mathworks forums.



If you have comments or questions, please feel free to contact me. Thanks!