gboost
Graph Boosting Toolbox for Matlab

Introduction

Discrete graph
A connected, undirected and labeled graph.

The gboost toolbox is a framework for classification of connected, undirected, labeled graphs. A typical graph is shown on the right: each node and edge is labeled by a discrete value.

The gboost classifiers check for the presence of certain subgraphs in the larger graph. The subgraphs being checked are optimally determined by discriminative subgraph mining. The classification hypotheses is interpretable because only a small number of subgraphs are used to determine the overall classification decision.

On this page, we provide instructions and source code in order to foster the adoption of the graph boosting methodology. Graph Boosting has been used successfully in chemical compound classification, natural language processing and computer vision. Please see the publications section for published papers.

Features and Demo

The gboost toolbox includes code for the following functionalities:

Most of the code is written in C++ with MEX Matlab wrappers.

The following screen capture shows the output when the included example.m file from the distribution is run.

>> example
Loading example graphs...
   96 training samples
   69 test samples

We train a 2-class graph boosting classifier.
the settings are:
   nu = 0.2, the LPBoost nu-parameter controlling training accuracy
   conv_epsilon = 0.05, the LPBoost convergence tolerance parameter
   max_col = 25, generate 25 hypotheses in each iteration (multiple pricing)

Please press return to start training...
=== GLPBOOST = BEGIN = 2-class graph based LPBoosting ===
LPBoost iter 1
Starting gspan-boost run...
(...)
gSpan returned 25 subgraphs
   optimality: 0.34279 <= 0 + 0.05 ?
(...)
gSpan returned 25 subgraphs
   optimality: 0.097308 <= 0.049357 + 0.05 ?
LPBoost optimality reached after 10 iterations.
   (no improvements left)
=== END ===
The classifier has been trained successfully.
There are 161 active subgraph stumps.
Please press return to test the classifier...
   test accuracy = 0.75362
   test ROC AUC = 0.79899
   test ROC EER = 0.75676
>>

Download

The gboost toolkit for Matlab can be downloaded below. The package includes the source code, pre-compiled binaries for the Linux/x86 and the Linux/x86-64 architectures. Also included is a set of graphs from the CPDB chemical database to test the graph boosting functionality.

Distribution: source code, precompiled binaries and demo file

License: The software is dual-licensed under both the GNU General Public License, version 2 and the Mozilla Public License, version 1.1. A copy of the license documents is included in the distribution. The dual-licensing allows you to pick any of the two licenses and thus makes it easier to combine this software with other toolboxes.

Installation: the distribution includes a Makefile.options file which specifies the MATLABROOT variable. After editing the file accordingly, the program should compile on any recent Linux system. If you have the frequently encountered problem complaining about GCC_3.3 not being found when you run the mex functions, please refer to this discussion at the Mathworks forums.

Publications

Contact

If you have comments or questions, please feel free to contact me. Thanks!