Support Vector Machine with GPU, Part II

In our last tutorial on SVM training with GPU, we mentioned a necessary step to pre-scale the data with rpusvm-scale, and to reverse scaling the prediction outcome. This cumbersome procedure is now simplified with the latest RPUSVM.

For example, we can work directly with the cadata from the LIBSVM site. Just load it into the R workspace with read.svm.data and apply the function rpusvm right away. The overhead of the implicit data scaling turns out to be rather negligible.

> library(rpud)                     # load rpudplus
> cadata <- read.svm.data("cadata", fac=FALSE)
> x <- cadata$x; y <- cadata$y
> system.time(cadata.rpusvm <- rpusvm(x, y, type="eps-regression"))
........**.
   user  system elapsed
  6.510   0.020   6.539

We can inspect the range of each attribute of cadata in the SVM model cdata.rpusvm. In particular, for a data set with N attributes, the x.bound component of cdata.rpusvm is a 2 × N matrix, with each column containing the lower and upper bounds of the corresponding attribute. Likewise, the y.bound component contains the lower and upper bounds of the response variable.

> cadata.rpusvm$x.bound
        [,1] [,2]  [,3] [,4]  [,5] [,6]  [,7]    [,8]
[1,]  0.4999    1     2    1     3    1 32.54 -124.35
[2,] 15.0001   52 39320 6445 35682 6082 41.95 -114.31

> cadata.rpusvm$y.bound
       [,1]
[1,]  14999
[2,] 500001

Using the residuals component of the SVM model, we can compute the mean square error:

> cadata.res <- cadata.rpusvm$residuals
> sum(cadata.res*cadata.res)/length(cadata.res)
[1] 4.091e+09

As for prediction, we can apply the function predict on a test data set without further post-processing:

> test.dat <- read.svm.data("test.dat", fac=FALSE)
> pred <- predict(cadata.rpusvm, test.dat$x)
> head(pred)
1 2 3 4 5 6
401234 447244 392549 330003 244601 253104

If we decide to train an SVM in a terminal, we can apply the standalone rpusvm tool by adding extra scale parameters:

$ time rpusvm-train -x "-1:1" -y "-1:1" -s 3 cadata cadata.rpusvm
rpusvm-train 0.1.2
http://www.r-tutor.com
Copyright (C) 2011-2012 Chi Yau. All Rights Reserved.
This software is free for academic use only. There is absolutely NO warranty.

GeForce GTX 460 GPU

.........**.

Finished optimization in 9583 iterations
nu = 0.590296
obj = -2232.72, rho = -0.300248
nSV = 12221, nBSV = 12160
Total nSV = 12221

real    0m5.171s
user    0m5.020s
sys     0m0.130s

A glance of the SVM model cadata.rpusvm shows that the scale parameters are gathered in the header section.

$ head -n 20 cadata.rpusvm
SvmType: smo-epsilon-regression
KernelType: radial
Gamma: 0.125
X-scale: -1 1
  0: 0.4999 15.0001
  1: 1 52
  2: 2 39320
  3: 1 6445
  4: 3 35682
  5: 1 6082
  6: 32.54 41.95
  7: -124.35 -114.31
Y-scale: -1 1
   14999 500001
NrClass: 2
TotalSV: 12221
Rho: -0.300248

Finally, we can use the model for prediction without pre-scaling the test data or post-scaling the outcome.

$ time rpusvm-predict cadata cadata.rpusvm cadata.out
rpusvm-predict 0.1.2
http://www.r-tutor.com
Copyright (C) 2011-2012 Chi Yau. All Rights Reserved.
This software is free for academic use only. There is absolutely NO warranty.

GeForce GTX 460 GPU

Mean squared error = 4.09101e+09
Pearson correlation coefficient = 0.698961

real    0m1.691s
user    0m1.480s
sys     0m0.170s

Tags:

An R Introduction to Statistics

Support Vector Machine with GPU, Part II

R Tutorial eBook

R Tutorials