If you are new to RVKDE, we suggest to read the README first, and then this document. The README introduces using two wrapper scripts (kde-train.pl and kde-predict.pl) to execute RVKDE, which is very similar to the procedure of using the well-know LIBSVM. An alternative way to use RVKDE is directly executing the rvkde (or rvkde.exe on Windows system) binary executable file. There are two benefits by doing so:
The following sections are written mainly for Linux users. The mapping of most operations are trivial on Windows system.
cd ~ mkdir tmp
cd ~/tmp wget http://mbi.ee.ncku.edu.tw/rvkde/res/rvkde-current-linux32.tgz tar zxvf rvkde-current-linux32.tgz
cd ~/tmp
rvkde-0.2.3-final/rvkde --classify --train -v rvkde-0.2.3-final/satimage.scale -m rvkde-0.2.3-final/satimage.scale.model --ks 10 # training rvkde-0.2.3-final/rvkde --classify --predict -m rvkde-0.2.3-final/satimage.scale.model -V rvkde-0.2.3-final/satimage.scale.t -a 1 -b 1 --ks 10 --kt 10 # testing
rvkde-0.2.3-final/rvkde --classify --predict -v rvkde-0.2.3-final/satimage.scale -V rvkde-0.2.3-final/satimage.scale.t -a 1 -b 1 --ks 10 --kt 10
Most machine learning tools provide some parameters for users. For example, the k in knn classification algorithm and the k in k-means clustering algorithm. From the optimistic view, these parameters provide flexibility and make machine learning tools more powerful. However, from another point of view, these machine learning techniques cannot determine (or learn) some parameters automatically so that users must specify by themselves.
RVKDE provides two alternative ways to do its parameter selection as described in the two following sections.
rvkde-0.2.3-final/rvkde --cv --classify --acc -n 5 -v rvkde-0.2.3-final/satimage.scale -a 1 -b 1,2,0.5 --ks 1,30,1 --kt 1,30
Let's take a look at the command.
–cv | Switch rvkde into cross-validation mode. |
---|---|
–classify | Tell rvkde we want to do classification rather than regression now. |
–acc | Use accuracy as the evaluation index. |
-n | Do n-fold cross-validation. |
-v | Followed by the dataset for cross-validation. |
-a | Set the range (begin, end and step) of alpha values of RVKDE. In this example, 1 is the only possible alpha value. |
-b | Set the range (begin, end and step) of beta values of RVKDE. In this example, the possible beta values are 1, 1.5 and 2. |
–ks | Set the range (begin, end and step) of ks values of RVKDE. In this example, the possible ks values are 1, 2, … 30. |
–kt | Set the range (begin, end and step) of kt values of RVKDE. In this example, the possible kt values are also 1, 2, … 30 since the default step is 1. |
[0.918602] a=1 b=1 s=8 t=21...
Which tell us the best parameter combination is alpha = 1, beta = 1, ks = 8 and kt = 21. In addition, the accuracy under the best parameters is 0.918602.
Now we have a parameter combination derived from cross-validation.
rvkde-0.2.3-final/rvkde --predict --classify --acc -v rvkde-0.2.3-final/satimage.scale -V rvkde-0.2.3-final/satimage.scale.t -a 1 -b 1 --ks 8 --kt 21
Let's take a look at the command.
–predict | Switch RVKDE into prediction mode (rather than cross-validation). |
---|---|
-v | Followed by the training dataset. |
-V | Followed by the testing dataset. |
[0.9175] a=1 b=1 s=8 t=21...
It indicates that RVKDE can yield a accuracy of 0.9175 under this parameter combination when using satimage.scale to predict satimage.scale.t.
Another common procedure for parameter selection is to create an independent validation set. For example, you can use satimage.scale.tr and satimage.scale.val to do parameter selection and see how good the parameters are when applying on satimage.scale.t.
rvkde-0.2.3-final/rvkde --predict --classify --acc -v rvkde-0.2.3-final/satimage.scale.tr -V rvkde-0.2.3-final/satimage.scale.val -a 1 -b 1,2,0.5 --ks 1,30,1 --kt 1,30,1
[0.913599] a=1 b=1 s=8 t=23...
rvkde-0.2.3-final/rvkde --predict --classify --acc -v rvkde-0.2.3-final/satimage.scale.tr -V rvkde-0.2.3-final/satimage.scale.t -a 1 -b 1 --ks 8 --kt 23
[0.917] a=1 b=1 s=8 t=23...
It reveals that RVKDE yields very close accuracies with these two parameter selection schemes.