Support Vector Machine with GPU, Part II
In our last tutorial on SVM training with GPU, we mentioned a necessary step to pre-scale the data with rpusvm-scale, and to reverse scaling the prediction outcome. This cumbersome procedure is now simplified with the latest RPUSVM.
For example, we can work directly with the cadata from the LIBSVM site. Just load it into the R workspace with read.svm.data and apply the function rpusvm right away. The overhead of the implicit data scaling turns out to be rather negligible.
> cadata <- read.svm.data("cadata", fac=FALSE)
> x <- cadata$x; y <- cadata$y
> system.time(cadata.rpusvm <- rpusvm(x, y, type="eps-regression"))
........**.
user system elapsed
6.510 0.020 6.539
We can inspect the range of each attribute of cadata in the SVM model cdata.rpusvm. In particular, for a data set with N attributes, the x.bound component of cdata.rpusvm is a 2 × N matrix, with each column containing the lower and upper bounds of the corresponding attribute. Likewise, the y.bound component contains the lower and upper bounds of the response variable.
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0.4999 1 2 1 3 1 32.54 -124.35
[2,] 15.0001 52 39320 6445 35682 6082 41.95 -114.31
> cadata.rpusvm$y.bound
[,1]
[1,] 14999
[2,] 500001
Using the residuals component of the SVM model, we can compute the mean square error:
> sum(cadata.res*cadata.res)/length(cadata.res)
[1] 4.091e+09
As for prediction, we can apply the function predict on a test data set without further post-processing:
> pred <- predict(cadata.rpusvm, test.dat$x)
> head(pred)
1 2 3 4 5 6
401234 447244 392549 330003 244601 253104
If we decide to train an SVM in a terminal, we can apply the standalone rpusvm tool by adding extra scale parameters:
rpusvm-train 0.1.2
http://www.r-tutor.com
Copyright (C) 2011-2012 Chi Yau. All Rights Reserved.
This software is free for academic use only. There is absolutely NO warranty.
GeForce GTX 460 GPU
.........**.
Finished optimization in 9583 iterations
nu = 0.590296
obj = -2232.72, rho = -0.300248
nSV = 12221, nBSV = 12160
Total nSV = 12221
real 0m5.171s
user 0m5.020s
sys 0m0.130s
A glance of the SVM model cadata.rpusvm shows that the scale parameters are gathered in the header section.
SvmType: smo-epsilon-regression
KernelType: radial
Gamma: 0.125
X-scale: -1 1
0: 0.4999 15.0001
1: 1 52
2: 2 39320
3: 1 6445
4: 3 35682
5: 1 6082
6: 32.54 41.95
7: -124.35 -114.31
Y-scale: -1 1
14999 500001
NrClass: 2
TotalSV: 12221
Rho: -0.300248
Finally, we can use the model for prediction without pre-scaling the test data or post-scaling the outcome.
rpusvm-predict 0.1.2
http://www.r-tutor.com
Copyright (C) 2011-2012 Chi Yau. All Rights Reserved.
This software is free for academic use only. There is absolutely NO warranty.
GeForce GTX 460 GPU
Mean squared error = 4.09101e+09
Pearson correlation coefficient = 0.698961
real 0m1.691s
user 0m1.480s
sys 0m0.170s