Support Vector Machine

Author

Dr. Mohammad Nasir Abdullah

Support Vector Machine

Support vector machine (SVM) is a powerful supervised machine learning algorithm used for classification and regression tasks.

The objective of SVM is to find the optimal hyperplane that separates the datapoints into different classes or categories in the future space while maximixing the margin between the classes.

SVM is effective for binary classification problems but can be extended to handle multi-class classification and regression tasks.

Hyperplane

A flat affine subspace of dimension \(p-1\). A hyperplane is a decision boundary that separates the feature space into different regions corresponding to different classes.

In \(p>3\) dimensions, it can be hard to visualise a hyperplane, but the notion of a \(p-1\) -dimensional flat subspace still applies.

The mathematical definition of a hyperplane in \(p\)-dimensions:

\[ h(x)= \beta_{0}+\beta_{1}X_{1}+\beta_{2}X_{2}+...+\beta_{p}X_{p}=0 \]

Defines a \(p\)-dimensional hyperplane for a point \[X=(X_{1},X_{2},..,X_{1})^T\] in \(p\)-dimensional space.

Suppose that \(X\) does not satisfy the equation but:-

\(y_{i} \times h(X)>0\) , then \(X\) lies on \(+1\) class (green), if \(y_{i} \times h(X)<0\) , then \(X\) lies on \(+1\) class (orange)

The separating hyperplane cannot be unique. One can define many separating hyperplane within the same dataset. To choose the “best” one (Maximal Margin Hyperplane), We can compute the perpendicular distance from each training observation or retain the smallest distance (called “margin”).

Maximal Margin

Maximal margin is the distance between the hyperplane and the nearest data point (support vector) from each class. The optimal hyperplane in SVM is the one that maximizes this margin, providing the largest separation or margin between the class.

Maximal Margin is the separating hyperplane for which the margin is largest. It is the hyperplane that has the largest minimum distance to the training observations.

The maximal margin hyperplane is determined by searching the largest minimal distance between observation and the separating hyperplane. The solution of maximal margin hyperplane :

The penalization parameter \(C\) is equal to \(\frac{1}{\tau}\). It has a central role in classification performance. When \(C\) is larger, it produce tighter margins, When \(C\) is smaller, it produce wider margins. It is a trade-off between prediction variance and prediction bias.

Non-Separable Classes

If classes is non separable, a maximal margin hyperplane able to perfectly separate them simply does not exist. If classes are not separable, we would like to allow an observations to cross the boundaries provided by the marginal hyperplanes. Possibly allow for misclassification.

Solution: Provide larger robustness to specific configurations of the observations. Improving classification accuracy for the training observations.

There are 2 approaches to deal with non-separable class in the SVM:

Feature polynomial expansion

SVM with kernel adaptation

Kernels

SVM utilises kernel functions to map the input data points into a higher dimensional space where the separation between the two classes becomes easier. This allows SVM to solve complex non-linear problem as well.

The SVM’s optimization for kernels:

where is the inner product between vector \(x\) and \(x_{i}\) and \(\alpha_{i}\) is a parameter specific to every observation \(i=1,2,3,...,N\) .

The \(<x, x_{i}>\) can be generalize to the nonlinear transformation using kernel transformation as follows:

\[ G(x)= \beta_{0}+\sum\alpha_{i}K(x,x_{i}) \]

where \(K(.)\) is kernel function.

There are two kernel transformation are generally used:

Polynomial kernel

\[ K(x,x_{i})=[1+\sum X_{ij} x_{i'j}]^d \]

where \(d=1\) equivalent to linear support vector classifier.
Radial basis kernel

\[ K(x,x_{i})=exp[-\gamma\sum(x_{ij}-x_{i'j})^2] \]

it is a function of the Euclidean distance between the pair \(i\) and \(i'\) instead of the inner product.

The radial basis kernel behave similar to KNN (k-Nearest Neighbor) classifier. Choosing the right \(\gamma\) implying the sensitiveness of the classification. When \(\gamma\) is low, the classification becomes more sensitive to the distance, and larger distances can count a lot for the classification of a given test point. When \(\gamma\) is high, the classification become less sensitive to the distance and larger distances can count much less for classification (~ small \(k\) in \(k\)NN).

Advantages of SVM

Training is relatively easy
no local minima
It scales relatively well to high dimensional data
Trade-off between classifier complexity and error can be controlled explicitly via \(C\)
Robustness of the results
The “curse of dimensionality” is avoided.

Disadvantages of SVM

What is the best trade-off parameter \(C\)?
Need a good transformation of the original space.

Note!

Regularization parameter \(C\) that controls the trade-off between maximizing the margin and minimizing the classification error.
A smaller value of \(C\) allows for a wider margin and more misclassifications.
A larger value of \(C\) penalizes misclassifications more heavily, leading to a narrower margin.

Steps to perform SVM using `Caret`

Step 1: Loading Dataset
Step 2: Data Pre-Processing
Step 3: Splitting the dataset
Step 4: Performing Support Vector Machine classifier
Step 5: Model Prediction
Step 6: Model Evaluation
Step 7: Variable Importance

Example 1

Let’s revisit the Alzheimer’s disease dataset.

The Alzheimer’s disease (AD) Gene expression dataset from this link: https://sta334.s3.ap-southeast-1.amazonaws.com/Ready+for+Analysis.csv

The gene expression datasets of brain tissue was searched in the Gene Expression Omnibus (GEO) database using query: “Alzheimer* AND Homo sapiens” filtered by “Expression profiling by array” in August 2024. There are about 6,308 datasets found in search results. Out of these, 102 entries were sifted through. Finally, three datasets were chosen to be considered in this study since they were generated using the same platform (GSE5281, GSE48350, and GSE1297).

Step 1: Data loading

data1 <- read.csv("https://sta334.s3.ap-southeast-1.amazonaws.com/Ready+for+Analysis.csv")
head(data1)

    Group        PSEN1          APP         APOE        ACHE        TREM2
1 Control 0.0021085802 0.0015976046 1.160829e-03 0.012269944 0.0054893395
2 Control 0.0001995960 0.0001257131 2.084580e-04 0.001955550 0.0006830585
3 Control 0.0020621158 0.0016278327 1.363343e-03 0.013309547 0.0053926958
4 Control 0.0002043889 0.0001792020 1.697795e-04 0.002397516 0.0012158625
5 Control 0.0001205074 0.0001668663 1.327610e-04 0.002290510 0.0003164549
6 Control 0.0001985569 0.0001914827 2.249672e-05 0.006493680 0.0004278688
        PSEN2          GRN          TNF         BDNF         MAPT        IGF1
1 0.026968109 5.675153e-03 0.0072079350 0.0081503329 0.0025843462 0.018425013
2 0.002269130 2.655978e-04 0.0005798674 0.0005764181 0.0002149096 0.001474918
3 0.026317292 5.686696e-03 0.0072948773 0.0081766419 0.0024516081 0.016761990
4 0.002987549 5.786144e-04 0.0005476742 0.0027560729 0.0002188508 0.003162154
5 0.004377363 4.815553e-05 0.0006612107 0.0026274065 0.0003424017 0.005618862
6 0.004961120 2.293228e-04 0.0006714276 0.0033914613 0.0004066779 0.006035992
          BCHE         IL1B        GSK3B        BACE1        VEGFA          CLU
1 0.0062293955 0.0042711578 0.0048825202 0.0032308008 0.0029591847 5.344190e-04
2 0.0009994330 0.0013567076 0.0004909129 0.0002054756 0.0001137706 4.470377e-05
3 0.0079361783 0.0044167898 0.0044499904 0.0033278954 0.0029955404 5.507915e-04
4 0.0010081386 0.0008824772 0.0005980663 0.0003203496 0.0012646905 2.942959e-05
5 0.0006784625 0.0128361303 0.0009172137 0.0003226391 0.0010217839 3.382860e-05
6 0.0008908975 0.0005409837 0.0013971361 0.0003819569 0.0003898600 1.917527e-05
          MAOB          ACE      CYP46A1        CSF1R        SORL1        PPARG
1 4.381434e-04 0.0213815434 6.764623e-04 1.595632e-03 1.139808e-03 0.0148884862
2 6.872958e-05 0.0020308477 5.555082e-05 2.517926e-04 9.651259e-05 0.0013967562
3 4.315693e-04 0.0206319804 6.970052e-04 1.820593e-03 1.163912e-03 0.0164901858
4 4.336258e-05 0.0003963558 7.973186e-05 5.287020e-04 6.596785e-05 0.0005475606
5 2.866783e-05 0.0019227197 5.548465e-05 5.098238e-05 8.954600e-05 0.0008553496
6 2.914657e-05 0.0031879302 1.015235e-04 3.832849e-05 1.240756e-04 0.0021408063
          NOS3         PLAU        APOC1         SNCA      GRIN2B        INS
1 0.0099558688 0.0115676940 1.403756e-03 0.0029479252 0.022611076 0.08834095
2 0.0077513153 0.0009838012 1.555913e-04 0.0003164961 0.002061921 0.01234881
3 0.0103764802 0.0124601057 1.661865e-03 0.0030194144 0.021902472 0.09446872
4 0.0009253255 0.0010742659 6.958654e-04 0.0005567697 0.002424959 0.01056559
5 0.0005353675 0.0011378930 1.064549e-04 0.0005069732 0.002646536 0.01176676
6 0.0024551820 0.0012716690 4.195001e-05 0.0006266300 0.002927321 0.01416009
        ABCA7
1 0.007900448
2 0.001709702
3 0.008423505
4 0.001703346
5 0.000861461
6 0.001549339

To understand mode about the dataset, we use str(). The str() function provides a summary of the dataframe, including the number of rows and columns, the data types of each column. This can be useful for quickly understand the structure and contents of a dataframe

str(data1)

'data.frame':   378 obs. of  31 variables:
 $ Group  : chr  "Control" "Control" "Control" "Control" ...
 $ PSEN1  : num  0.002109 0.0002 0.002062 0.000204 0.000121 ...
 $ APP    : num  0.001598 0.000126 0.001628 0.000179 0.000167 ...
 $ APOE   : num  0.001161 0.000208 0.001363 0.00017 0.000133 ...
 $ ACHE   : num  0.01227 0.00196 0.01331 0.0024 0.00229 ...
 $ TREM2  : num  0.005489 0.000683 0.005393 0.001216 0.000316 ...
 $ PSEN2  : num  0.02697 0.00227 0.02632 0.00299 0.00438 ...
 $ GRN    : num  5.68e-03 2.66e-04 5.69e-03 5.79e-04 4.82e-05 ...
 $ TNF    : num  0.007208 0.00058 0.007295 0.000548 0.000661 ...
 $ BDNF   : num  0.00815 0.000576 0.008177 0.002756 0.002627 ...
 $ MAPT   : num  0.002584 0.000215 0.002452 0.000219 0.000342 ...
 $ IGF1   : num  0.01843 0.00147 0.01676 0.00316 0.00562 ...
 $ BCHE   : num  0.006229 0.000999 0.007936 0.001008 0.000678 ...
 $ IL1B   : num  0.004271 0.001357 0.004417 0.000882 0.012836 ...
 $ GSK3B  : num  0.004883 0.000491 0.00445 0.000598 0.000917 ...
 $ BACE1  : num  0.003231 0.000205 0.003328 0.00032 0.000323 ...
 $ VEGFA  : num  0.002959 0.000114 0.002996 0.001265 0.001022 ...
 $ CLU    : num  5.34e-04 4.47e-05 5.51e-04 2.94e-05 3.38e-05 ...
 $ MAOB   : num  4.38e-04 6.87e-05 4.32e-04 4.34e-05 2.87e-05 ...
 $ ACE    : num  0.021382 0.002031 0.020632 0.000396 0.001923 ...
 $ CYP46A1: num  6.76e-04 5.56e-05 6.97e-04 7.97e-05 5.55e-05 ...
 $ CSF1R  : num  0.001596 0.000252 0.001821 0.000529 0.000051 ...
 $ SORL1  : num  1.14e-03 9.65e-05 1.16e-03 6.60e-05 8.95e-05 ...
 $ PPARG  : num  0.014888 0.001397 0.01649 0.000548 0.000855 ...
 $ NOS3   : num  0.009956 0.007751 0.010376 0.000925 0.000535 ...
 $ PLAU   : num  0.011568 0.000984 0.01246 0.001074 0.001138 ...
 $ APOC1  : num  0.001404 0.000156 0.001662 0.000696 0.000106 ...
 $ SNCA   : num  0.002948 0.000316 0.003019 0.000557 0.000507 ...
 $ GRIN2B : num  0.02261 0.00206 0.0219 0.00242 0.00265 ...
 $ INS    : num  0.0883 0.0123 0.0945 0.0106 0.0118 ...
 $ ABCA7  : num  0.0079 0.00171 0.008424 0.001703 0.000861 ...

The target variable for this dataset is Group, which consist AD patient and Control subject. Currently, the datatype for this variable is Character, we need to convert this variable to factor before proceed with the analysis.

data1$Group <- as.factor(data1$Group)
table(data1$Group)


     AD Control 
    189     189

This variable has been converted to factor datatype. There were 189 AD patients and 189 Control subjects in the dataset.

Checking any missing values

Before proceed with the analysis, we need to make sure our dataset is cleaned and free from any missing value. To check for missing value:

library(descriptr)
ds_screener(data1)

----------------------------------------------------------------------
|  Column Name  |  Data Type  |  Levels  |  Missing  |  Missing (%)  |
----------------------------------------------------------------------
|     Group     |   factor    |AD Control|     0     |       0       |
|     PSEN1     |   numeric   |    NA    |     0     |       0       |
|      APP      |   numeric   |    NA    |     0     |       0       |
|     APOE      |   numeric   |    NA    |     0     |       0       |
|     ACHE      |   numeric   |    NA    |     0     |       0       |
|     TREM2     |   numeric   |    NA    |     0     |       0       |
|     PSEN2     |   numeric   |    NA    |     0     |       0       |
|      GRN      |   numeric   |    NA    |     0     |       0       |
|      TNF      |   numeric   |    NA    |     0     |       0       |
|     BDNF      |   numeric   |    NA    |     0     |       0       |
|     MAPT      |   numeric   |    NA    |     0     |       0       |
|     IGF1      |   numeric   |    NA    |     0     |       0       |
|     BCHE      |   numeric   |    NA    |     0     |       0       |
|     IL1B      |   numeric   |    NA    |     0     |       0       |
|     GSK3B     |   numeric   |    NA    |     0     |       0       |
|     BACE1     |   numeric   |    NA    |     0     |       0       |
|     VEGFA     |   numeric   |    NA    |     0     |       0       |
|      CLU      |   numeric   |    NA    |     0     |       0       |
|     MAOB      |   numeric   |    NA    |     0     |       0       |
|      ACE      |   numeric   |    NA    |     0     |       0       |
|    CYP46A1    |   numeric   |    NA    |     0     |       0       |
|     CSF1R     |   numeric   |    NA    |     0     |       0       |
|     SORL1     |   numeric   |    NA    |     0     |       0       |
|     PPARG     |   numeric   |    NA    |     0     |       0       |
|     NOS3      |   numeric   |    NA    |     0     |       0       |
|     PLAU      |   numeric   |    NA    |     0     |       0       |
|     APOC1     |   numeric   |    NA    |     0     |       0       |
|     SNCA      |   numeric   |    NA    |     0     |       0       |
|    GRIN2B     |   numeric   |    NA    |     0     |       0       |
|      INS      |   numeric   |    NA    |     0     |       0       |
|     ABCA7     |   numeric   |    NA    |     0     |       0       |
----------------------------------------------------------------------

 Overall Missing Values           0 
 Percentage of Missing Values     0 %
 Rows with Missing Values         0 
 Columns With Missing Values      0

There are no missing values in this dataset.

Step 2: Data Normalization

Before proceeding with analysis, it is the best practice to normalize the data to make sure all the variables in same scale. In this note, we will use min-max method to normalize the data.

# Min-max normalize all variables of mtcars
data1_mm <- as.data.frame(scale(data1[-1],
              center = apply(data1[-1], 2, min),
              scale = apply(data1[-1], 2, max) - apply(data1[-1], 2, min)))
data1 <- as.data.frame(cbind(data1[1], data1_mm))

Step 3: Splitting the dataset

In order to perform Machine learning classifier, we will split the dataset into training and testing set. Usually we will use 23 of the data for training set, and another 13 for testing set.

set.seed(123456) #set seed for reproducibility

U <- data1
predictors <- names(U)[!names(U) %in% "Group"] #define all predictors


library(caret) #loading the caret library


#Create splitting rules 70% for training and 30% for testing set
inTrainingSet <- createDataPartition(U$Group, p=0.7, list=F) 
train<- U[inTrainingSet,]
test<- U[-inTrainingSet,]

x = train[,predictors] #predictors for training set
y=train$Group   #Outcome for training set

x1=test[,predictors] #Predictors for testing set
y1<-test$Group #outcome for test set

The dataset has been splited into training and testing set. There are 70% of the observations in training set, and 30% for testing set.

dim(train)

[1] 266  31

table(train$Group)


     AD Control 
    133     133

In training set, we have 266 observations, 133 AD Patients and 133 Control subjects.

dim(test)

[1] 112  31

table(test$Group)


     AD Control 
     56      56

In test set we have 112 Observations, 56 AD patients, and 56 Control subjects.

Step 4: Perform Support Vector Machine classifier

set.seed(123456) #set seed for reproducibility

fitControl <- trainControl(
  method = "cv",  # cross-validation
  number = 10,    # number of folds (10-folds cv)
  summaryFunction = twoClassSummary, #Use accuracy, ROC, etc..
  classProbs = TRUE, #Enable probability prediction
  search = "grid", #Perform grid search
  allowParallel = T, #parallel processing
)

Step 5: Fitting the model - Support Vector Machine with Radial Basis Kernel

In fitting the machine learning classifier, there were many algotrithms offered by caret package. We can refer to this link to see the list of machine learning algorithms offered : https://topepo.github.io/caret/available-models.html

or we can simply use this function to see the list of available machine learning method :

names(getModelInfo())

1. Fitting the model using automated hyperparameter search

Now we can fit the model into caret library using train function.

library(tictoc) #to record the time 
tic() #begin record time
set.seed(123456) #set seed for reproducibility

svm_model <- train(
  Group ~ .,      #Group as dependent and consider all variables left
  data = train,   # using train dataset to learn from the data
  method = "svmRadial",  # using nb - naive bayes classifier.
  trControl = fitControl,  #using pre-defined cross validation strategy
  tuneLength = 5 #tuning for every spliting 
)

Warning in train.default(x, y, weights = w, ...): The metric "Accuracy" was not
in the result set. ROC will be used instead.

toc() #stop record time

3.77 sec elapsed

Model summary

svm_model

Support Vector Machines with Radial Basis Function Kernel 

266 samples
 30 predictor
  2 classes: 'AD', 'Control' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 240, 240, 238, 239, 240, 239, ... 
Resampling results across tuning parameters:

  C     ROC        Sens       Spec     
  0.25  0.7008363  0.6263736  0.6159341
  0.50  0.7080878  0.6263736  0.6379121
  1.00  0.7008513  0.6560440  0.5543956
  2.00  0.7265970  0.7302198  0.6060440
  4.00  0.7583806  0.7450549  0.6906593

Tuning parameter 'sigma' was held constant at a value of 209.2159
ROC was used to select the optimal model using the largest value.
The final values used for the model were sigma = 209.2159 and C = 4.

Based on the model, we know that it only tooks 3.88 Seconds to complete the analysis. The model used 10 fold cross validation resampling method, to tune the hyperparameter. The final hyperparameter used were \(\sigma\) = 209.2159 and \(C\) = 4.

we can view the \(C\) searching for this model

plot(svm_model)

we can see that at \(C\) = 4, the area under the ROC curve is the highest.

2. Fitting the model using automated hyperparameter search - Parallel processing

library(tictoc)
tic() 

library(doParallel) #parallel processing

Loading required package: foreach

Loading required package: iterators

Loading required package: parallel

cl <- makePSOCKcluster(3) #setting up the cluster
registerDoParallel(cl) #register cluster 
set.seed(123456) #set seed for reproducibility

svm_model2 <- train(
  Group ~ .,      #Group as dependent and consider all variables left
  data = train,   # using train dataset to learn from the data
  method = "svmRadial",  # using nb - naive bayes classifier.
  trControl = fitControl,  #using pre-defined cross validation strategy
  tuneLength = 5 #tuning for every spliting 
)

Warning in train.default(x, y, weights = w, ...): The metric "Accuracy" was not
in the result set. ROC will be used instead.

stopCluster(cl)
toc()

5.81 sec elapsed

svm_model2

Support Vector Machines with Radial Basis Function Kernel 

266 samples
 30 predictor
  2 classes: 'AD', 'Control' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 240, 240, 238, 239, 240, 239, ... 
Resampling results across tuning parameters:

  C     ROC        Sens       Spec     
  0.25  0.7008363  0.6263736  0.6159341
  0.50  0.7080878  0.6263736  0.6379121
  1.00  0.7008513  0.6560440  0.5543956
  2.00  0.7265970  0.7302198  0.6060440
  4.00  0.7583806  0.7450549  0.6906593

Tuning parameter 'sigma' was held constant at a value of 209.2159
ROC was used to select the optimal model using the largest value.
The final values used for the model were sigma = 209.2159 and C = 4.

By using parallel processing, the hyperparameter were same as previous.

Caution

After using parallel processing, we need to unregister the foreach backend.

registerDoSEQ()

3. Fitting with manual grid hyperparameter search

Instead of using automated hyperparameter search from caret setting, we can also search our own hyperparameter setting.

Firstly, we need to know what hyperparameters available for the model. To know that we need to check the model specification:

modelLookup(model="svmRadial")

      model parameter label forReg forClass probModel
1 svmRadial     sigma Sigma   TRUE     TRUE      TRUE
2 svmRadial         C  Cost   TRUE     TRUE      TRUE

Remember!

Each machine learning classifier have different hyperparameter! If we not sure what area of hyperparameter should be search, please only use automated searching. This function served for those who know the area to be searching.undefined

#for NB, we set our hyperparameter search area
Grid = expand.grid(C=seq(1,10,1),sigma=1/ncol(train))
set.seed(123456) #set seed for reproducibility

svm_model3 <-  train(
  Group ~ .,      #Group as dependent and consider all variables left
  data = train,   # using train dataset to learn from the data
  method = "svmRadial",  # using nb - naive bayes classifier.
  trControl = fitControl,  #using pre-defined cross validation strategy
  tuneLength = 5, #tuning for every spliting 
  tuneGrid = Grid #Insert search area
)

svm_model3

Support Vector Machines with Radial Basis Function Kernel 

266 samples
 30 predictor
  2 classes: 'AD', 'Control' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 240, 240, 238, 239, 240, 239, ... 
Resampling results across tuning parameters:

  C   ROC        Sens       Spec     
   1  0.6370487  0.5961538  0.6840659
   2  0.6258574  0.5961538  0.6840659
   3  0.6297881  0.5961538  0.6840659
   4  0.6291571  0.5961538  0.6840659
   5  0.6296281  0.5961538  0.6840659
   6  0.6311979  0.5961538  0.6840659
   7  0.6333595  0.5961538  0.6840659
   8  0.6332780  0.5961538  0.6840659
   9  0.6337852  0.5961538  0.6840659
  10  0.6336644  0.5961538  0.6840659

Tuning parameter 'sigma' was held constant at a value of 0.03225806
ROC was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.03225806 and C = 1.

Step 6: Model Prediction

Once we have develop the model, we can do prediction by using test data.

We will use svm_model for this step.

Remember in step 3, we define x1 as all the predictors for test set

predictions <- predict(svm_model, x1 , response = "class")

In this step, the we did prediction to the test dataset using all predictors and predict the Group for each rows.

head(predictions)

[1] Control Control Control AD      AD      Control
Levels: AD Control

This means that, for the first 3 observations, the model predicted Control subject.

Lets inspect the test data

head(test[1:10], 3)

    Group        PSEN1          APP         APOE        ACHE        TREM2
4 Control 1.820225e-04 0.0001792020 1.612737e-04 0.002819723 0.0012158625
6 Control 1.761903e-04 0.0001914827 1.398972e-05 0.007637231 0.0004278688
7 Control 9.553548e-05 0.0001362089 1.098387e-04 0.003099828 0.0007049007
        PSEN2          GRN          TNF         BDNF
4 0.002987549 0.0005578259 0.0004794451 0.0027560729
6 0.004961120 0.0002085270 0.0006032069 0.0033914613
7 0.004235372 0.0002750511 0.0002220194 0.0009503778

Step 7: Model Evaluation

It is important to see how well the model performs for this dataset. We need to evaluate interms of accuracy, sensitivity, specificity, error rate and f-measure.

First, we need to get the confusion matrix from the predicted value.

#Creating confusion matrix
table1 <- table(predictions, test$Group)
table1

           
predictions AD Control
    AD      42      18
    Control 14      38

confusionMatrix(table1)

Confusion Matrix and Statistics

           
predictions AD Control
    AD      42      18
    Control 14      38
                                          
               Accuracy : 0.7143          
                 95% CI : (0.6212, 0.7957)
    No Information Rate : 0.5             
    P-Value [Acc > NIR] : 3.283e-06       
                                          
                  Kappa : 0.4286          
                                          
 Mcnemar's Test P-Value : 0.5959          
                                          
            Sensitivity : 0.7500          
            Specificity : 0.6786          
         Pos Pred Value : 0.7000          
         Neg Pred Value : 0.7308          
             Prevalence : 0.5000          
         Detection Rate : 0.3750          
   Detection Prevalence : 0.5357          
      Balanced Accuracy : 0.7143          
                                          
       'Positive' Class : AD

Based on the performance evaluation, we can see that the accuracy of Support vector machine with radial basis kernel model for the test set was 0.7143, sensitivity was 0.7500, and specificity was 0.6786.

To get full access of all performance measure we can call :

perf <- confusionMatrix(table1)

perf$overall

      Accuracy          Kappa  AccuracyLower  AccuracyUpper   AccuracyNull 
  7.142857e-01   4.285714e-01   6.212225e-01   7.956703e-01   5.000000e-01 
AccuracyPValue  McnemarPValue 
  3.283170e-06   5.958831e-01

perf$byClass

         Sensitivity          Specificity       Pos Pred Value 
           0.7500000            0.6785714            0.7000000 
      Neg Pred Value            Precision               Recall 
           0.7307692            0.7000000            0.7500000 
                  F1           Prevalence       Detection Rate 
           0.7241379            0.5000000            0.3750000 
Detection Prevalence    Balanced Accuracy 
           0.5357143            0.7142857

Step 8: Variable Importance

By using caret package, we can get the list of variable importance.

The variable importance refers to a metric that ranks the features of a dataset based on their contribution to the prediction of the target variable. Depending on the model type, the method for calculating importance can vary.

varImp(svm_model)

ROC curve variable importance

  only 20 most important variables shown (out of 30)

       Importance
APOC1      100.00
CSF1R       97.65
TREM2       94.05
CLU         90.85
PLAU        85.38
ABCA7       80.96
GRN         75.54
MAOB        75.50
SORL1       74.81
VEGFA       72.26
NOS3        72.07
ACE         71.75
IL1B        70.47
PPARG       69.26
APOE        66.67
GRIN2B      65.74
INS         65.55
TNF         65.41
ACHE        54.64
APP         53.75

This function lists only 20 most important biomarkers for AD.

We can also plot the variable importance

plot(varImp(svm_model, scale=F))

Based on this, we can see that the APOC1 and CSF1R genes were the most important biomarkers for AD.

Dr. Mohammad Nasir Abdullah
PhD (Statistics), MSc (Medical Statistics), BSc(hons)(Statistics), Diploma in Statistics,
Graduate Statistician, Royal Statistical Society.
Senior Lecturer,
Mathematical Sciences Studies,
College of Computing, Informatics and Mathematics,
Universiti Teknologi MARA,
Tapah Campus, Malaysia.