This page provides a MATLAB re-implementation of the methods described in , which approximately reproduce the experimental results given in that paper. The software requires the fminunc function from the MATLAB optimization toolbox. If you do not have this toolbox, you will need to alter the @graddesc object to use a different gradient descent optimiser. Note that the software is of "research quality" and has no documentation and is provided essentially unsupported.
Improved Experimental Method
The software provided will evaluate the LS-SVM and LS-SVM-BR (@l2lssvm in the software) over the suite of thirteen benchmark datasets used in the original paper. The results are not quite the same, as a different gradient descent optimiser is used, but the overall pattern is similar. The results provided below were obtained using a slightly improved methodology. In order to make sure that the difference in performance between the methods with the standard spherical RBF kernel and the eliptical ARD kernel are due to over-fitting the model selection criterion, rather than due to local minima in the cost function, the ARD kernel is optimised twice. The first time, it is optimised starting from the equivalent optimal RBF kernel (so the model selection criterion cannot be worse than for the RBF kernel) and also from the default value (0.125). The solution with the lowest value of PRESS is then used. These results should be regarded as more reliable than those given in , as a more thouroughly tested gradient descent function was used and because the experimental method is better.
The results are shown in the table below, for each dataset, the best mean error rate is shown in bold, the worst is shown underlined. Note that the results for the RBF kernel are generally better than those for th ARD kernel. As the RBF kernel is a special case of the ARD kernel, this illustrates that over-fitting the model selection criterion is a genuine problem in using kernel learning methods where there are many kernel parameters to be determined. The LS-SVM-BR generally out-performs the LS-SVM for the ARD kernel, in some cases by a very substantial margin (e.g. heart), which shows that Bayesian regularisation of the hyper-parameters is beneficial. An even better approach to this problem is given in , where the kernel parameters are treated as parameters, rather than hyper-parameters.
- MATLAB software (jmlr2007a.zip)
There are eight experiments in all, run the matlab script run_experiment in each directory, and the results will be in a file called summary.txt in the directory [benchmark]/results/, where [benchmark] is the name of the benchmark dataset. The key experiments are:
- experiment001 - RBF LS-SVM
- experiment002 - RBF LS-SVM-BR
- experiment007 - ARD LS-SVM
- experiment008 - ARD LS-SVM-BR
The reproduction of the results presented here was carried out on the High Performance Computing Cluster supported by the Research and Specialist Computing Support service at the University of East Anglia.
|||G. C. Cawley and N. L. C. Talbot, "Preventing over-fitting in model selection via Bayesian regularisation of the hyper-parameters", Journal of Machine Learning Research, volume 8, pages 841-861, April 2007. (pdf)|
|||Gavin C. Cawley and Nicola L. C. Talbot, "Kernel Learning at the First Level of Inference", Neural Networks, volume 53, pages 69-80, May 2014. (, preprint)|