Semagle.Framework


SVM classification and regression

This tutorial shows how to train and evaluate the performance of SVM models for for two and one class classification and regression problems using LIBSVM datasets.

Initialization

First, we need to load Semagle framework assemblies for manipulation of vector data and SVM training/prediction.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
#r "Semagle.Numerics.Vectors.dll"
#r "Semagle.Numerics.Vectors.IO.dll"
#r "Semagle.MachineLearning.Metrics.dll"
#r "Semagle.MachineLearning.SVM.dll"

open LanguagePrimitives
open System

open Semagle.Numerics.Vectors
open Semagle.Numerics.Vectors.IO
open Semagle.MachineLearning.Metrics
open Semagle.MachineLearning.SVM

Reading LIBSVM Data

Semagle.Numerics.Vectors.IO provides function LibSVM.read that returns the lazy sequence of (y, x) pairs, but Semagle.MachineLearning.SVM requires separate arrays of labels and samples for training. Function readData converts the sequence to array and splits the array of pairs.

1: 
2: 
3: 
4: 
5: 
let readData file = LibSVM.read file |> Seq.toArray |> Array.unzip

let train_y, train_x = readData fsi.CommandLineArgs.[1]

let test_y, test_x = readData fsi.CommandLineArgs.[2]

Training

There are three different functions SMO.C_SVC, SMO.OneClass and SMO.C_SVR that build two class classification, one class classification and regression SVM models. The functions take the samples array train_x, the labels array train_y (except for OneClass), the kernel function (Kernel.rbf 0.1f) and parameters specific to the particular optimization problem.

Two Class

Two class classification problem requires separate penalties for positive C_p and negative C_n samples:

1: 
2: 
let svm = SMO.C_SVC train_x train_y (Kernel.rbf 0.1f) 
                    { C_p = 1.0f; C_n = 1.0f } SMO.defaultOptimizationOptions }

One Class

One class classification problem requires the fraction of support vectors nu:

1: 
2: 
let svm = SMO.OneClass train_x (Kernel.rbf 0.1f) 
                      { nu = 0.5f } SMO.defaultOptimizationOptions

Regression

Regression problem requires the boundary eta and the penalty C:

1: 
2: 
3: 
let svm = SMO.C_SVR train_x train_y (Kernel.rbf 0.1f) 
                    (Kernel.rbf 0.1f) { eta = 0.1f; C = 1.0f }
        SMO.defaultOptimizationOptions

Predicting

There are three different prediction functions TwoClass.predict, OneClass.predict and Regression.predict. The functions take the model model and the sample x and return the label y. The prediction function can be curried

  • Two Class

    1: 
    
    let predict = TwoClass.predict svm
    
  • One Class

    1: 
    
     let predict = OneClass.predict svm
    
  • Regression

    1: 
    
     let predict = Regression.predict svm   
    

and applied to the test samples vector

1: 
let predict_y = test_x |> Array.map (fun x -> predict x)

Evaluating

There are two widely used metrics for the evaluation of the performace:

  • Accuracy (two and one class classification)

    1: 
    
     let accuracy = Classification.accuracy test_y predict_y
    
  • Mean Squared Error (regression)

    1: 
    
     let mse = Regression.mse test_y predict_y
    
module LanguagePrimitives

from Microsoft.FSharp.Core
namespace System
namespace System.Numerics
namespace System.IO
val readData : file:'a -> 'b [] * 'c []

Full name: SVM.readData
val file : 'a
module Seq

from Microsoft.FSharp.Collections
val toArray : source:seq<'T> -> 'T []

Full name: Microsoft.FSharp.Collections.Seq.toArray
type Array =
  member Clone : unit -> obj
  member CopyTo : array:Array * index:int -> unit + 1 overload
  member GetEnumerator : unit -> IEnumerator
  member GetLength : dimension:int -> int
  member GetLongLength : dimension:int -> int64
  member GetLowerBound : dimension:int -> int
  member GetUpperBound : dimension:int -> int
  member GetValue : index:int64 -> obj + 7 overloads
  member Initialize : unit -> unit
  member IsFixedSize : bool
  ...

Full name: System.Array
val unzip : array:('T1 * 'T2) [] -> 'T1 [] * 'T2 []

Full name: Microsoft.FSharp.Collections.Array.unzip
val train_y : obj []

Full name: SVM.train_y
val train_x : obj []

Full name: SVM.train_x
val fsi : Compiler.Interactive.InteractiveSession

Full name: Microsoft.FSharp.Compiler.Interactive.Settings.fsi
property Compiler.Interactive.InteractiveSession.CommandLineArgs: string []
val test_y : obj []

Full name: SVM.test_y
val test_x : obj []

Full name: SVM.test_x
val svm : 'a

Full name: SVM.svm
val svm : obj

Full name: SVM.svm
val svm : obj
val map : mapping:('T -> 'U) -> array:'T [] -> 'U []

Full name: Microsoft.FSharp.Collections.Array.map
Fork me on GitHub