Homework 3: Problem 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Homework 3

Qiu G

3/31/2020

Problem 1

Part 1)

eiπ , F,
Rn , φ, xπ , x
∀x ∈ Rd , we define φ : Rd → F such that φ(x) = x.
We show that k ∗ is a valid kernel.

k ∗ (x, x0 ) = hφ(x), φ(x0 )iF (1)


0
= hx, x i (2)

Part 2)

We know that k1 (x, x0 ) is a valid kernel. Using the basic definitions and properties for kernels we can
deduce the following: it is positive semi definite.
For real a > 0, a ∗ k1 (x, x0 ) is a valid kernel.
Hence:

k ∗ (x, x0 ) = ak1 (x, x0 ) (3)


is a valid kernel.

Part 3)

We know that k1 (x, x0 ), is a valid kernel. We use Mercer’s Theorem and the following decomposition:


X
k ∗ (x, x0 ) = g(x)[ λj φ(x)j φ(xj0 )]g(x0 ) (4)
j=1

X
= λj g(x)φ(x)g(x0 )φ(x0 ) (5)
j=1

= hg(x)φ(x), g(x0 )φ(x0 )iF (6)


∗ ∗ 0
= hφ (x), φ (x )i (7)

1
Part 4)

Both k1 (x, x0 ) and k2 (x, x0 ) are valid kernels. Suppose φ1 (x) and φ2 (x) corrspond to the kernels just
mentioned (respectively). We define φ∗ : Rd → F, such that φ∗i,j (x) = φ1,i (x)φ2,j (x). We can easily see
that:

k ∗ (x, x0 ) = hφ∗ij (x), φ∗ij (x0 )iF (8)


= hφ1,i (x)φ2,j (x), φ1,i (x0 )φ2,j (x0 )i (9)
0 0
= k1 (x, x )k2 (x, x ) (10)

Part 5)

We import the data.

#Import data, and load svm library


library(tidyverse)

## -- Attaching packages -------------------------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.0 v purrr 0.3.4


## v tibble 3.0.1 v dplyr 0.8.5
## v tidyr 1.0.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0

## -- Conflicts ----------------------------------------------------------- tidyverse_conflicts() --


## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()

library(e1071)

data1 <- read_csv("~/Downloads/HW3Problem1.csv")%>%mutate(class=factor(class))

## Parsed with column specification:


## cols(
## x1 = col_double(),
## x2 = col_double(),
## class = col_character()
## )

#Train svm
svm1 <- svm(class~.,data = data1)
svm1

##
## Call:
## svm(formula = class ~ ., data = data1)
##
##
## Parameters:
## SVM-Type: C-classification

2
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 347

#Generate plot
plot(svm1, data = data1)

SVM classification plot

x xx xx
o o xx x x xxx
xx x xx
x x
o xx
o x x xx
x
x
xx x x o xx x xxxx
x x xxx x x x x x
x xx xx x x
5 oxo x
x xxxx x x
x
xx x xx

B
x x x
o x xx x
x x x x
x xx x
oxx x xx x xx
x xxxxx xxxxxx xx xoxxoxxxxxxoxx
x oo x x x x x x o
oo
oo xoo x x
xxx xxxxxxo ooo xxx oxxx
x xxxoo o xx xx x x xxxx xx o xxx xx x
x1

0 x x x x
x o x x x xxx oxx x xxxooo x
x xx o x
x ox xxx xxo o x x xxx xxx oxx xxx x xx
x x x x x xxoxoxxxx xxxxxxxxx xxxxx xxx
o
o xx x x xx
x x x
x xx x x xx x xx oo
−5 x o x xx x xx x xxxxxxx x o
x

A
xx x xxx xxxxx x x x x
x x x xx x x x x xox
x xoo
x xx x
xx x x xx x xx xx x o
xx x
−10 o
−10 −5 0 5 10

x2

3
Problem 2

Part 1)

We construct the hinge loss function g(z).

#Range
z <- seq(-1,3,by=0.001)

#Hinge Loss
g <- function(z){
result <- ifelse(1-z>=0,1-z,0)
return(result)
}

#Create tibble
t1 <- tibble(z=z)%>%mutate(loss=g(z))

#Create plot
g1 <- ggplot(data = t1, mapping = aes(x=z,y=loss))+
geom_line()+
labs(title = "Hinge Loss g(z)")
g1

Hinge Loss g(z)


2.0

1.5
loss

1.0

0.5

0.0

−1 0 1 2 3
z

4
Part2)

When zi < 1, we have Qi (vH , c) = 1 − yi (w1 xi,1 + w2 xi,2 + c) + λ2 (w12 + w22 ). Therefore:

 
−yi xi,1 + λw1
∇Qi (vH , c) = −yi xi,2 + λw2  (11)
−yi

When zi > 1, we have Qi (vH , c) = λ2 (w12 + w22 ). Therefore:

 
λw1
∇Qi (vH , c) = λw2  (12)
0

Part3)

Here β is the parameter vector.

#Load data
data2 <- read_csv("~/Downloads/HW3Problem2.csv")

## Parsed with column specification:


## cols(
## x1 = col_double(),
## x2 = col_double(),
## y = col_double()
## )

#Create Algorithm
gradient.descent <- function(data,lambda=0.25,iter=10000){
n <- nrow(data)
x1 <- data[,1]
x2 <- data[,2]
y <- data[,3]
sum <- 0
beta <- runif(3)
for (i in 1:iter) {
z<-y*(beta[1]+beta[2]*x1+beta[3]*x2)
if(z[[1]][1]<1){
sum<-c(-y[[1]][1],-y[[1]][1]*x1[[1]][1]+lambda*beta[2],-y[[1]][1]*x2[[1]][1]+lambda*beta[3])
}
else{
sum<-c(0,lambda*beta[2],lambda*beta[3])
}
for (j in 2:n) {
if(z[[1]][j]<1){
sum<-sum+c(-y[[1]][j],-y[[1]][j]*x1[[1]][j]+lambda*beta[2],-y[[1]][j]*x2[[1]][j]+lambda*beta[3])
}
else{
sum<-sum+c(0,lambda*beta[2],lambda*beta[3])

5
}

}
sum <- sum/n
eta<-1/(i*lambda)
beta<-beta-eta*sum

}
return(beta)
}

#Evaluate beta
beta.hat <- gradient.descent(data2)

#Display
beta.hat

## [1] -2.8072328 -0.8211780 0.8368775

Part 4)

We plot the points with the decision boundary from the previous part.

data3<-data2%>%mutate(y=factor(y))
g2 <- ggplot(data = data3, mapping = aes(x=x1,y=x2,colour=y))+
geom_point()+
geom_abline(slope=-beta.hat[2]/beta.hat[3],intercept = -beta.hat[1])+
labs(title = "SVM")
g2

6
SVM

y
x2

−1
1
1

−1

−3 −2 −1 0
x1

You might also like