-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed
Description
I believe that setting gamma='scale'
in SVC
is not meeting its intended purpose of being invariant to the scale of X
. Currently, gamma
is set to 1 / (n_features * X.std())
. However, I believe it should be 1 / (n_features * X.var())
.
Rationale: if you scale X
by 10 you need to scale gamma
by 1/100, not 1/10, to achieve the same results. See the definition of the RBF kernel here: the "units" of gamma
are 1/x^2, not 1/x.
I also tested this empirically: scaling X
by 10 and scaling gamma
by 1/100 gives the same result as the original, whereas scaling X
by 10 and scaling gamma
by 1/10 gives a different result. Here is some code:
import numpy as np
from sklearn.svm import SVC
X = np.random.rand(100,10)
y = np.random.choice(2,size=100)
svm = SVC(gamma=1)
svm.fit(X,y)
print(svm.decision_function(X[:5]))
# scale X by 10, gamma by 1/100
svm = SVC(gamma=0.01)
svm.fit(10*X,y)
print(svm.decision_function(10*X[:5])) # prints same result
# scale X by 10, gamma by 1/10
svm = SVC(gamma=0.1)
svm.fit(10*X,y)
print(svm.decision_function(10*X[:5])) # prints different result
Note that gamma='scale'
will become the default setting for gamma
in version 0.22.
TomDLT, agdal1125, hfboyce, vansjyo, joelostblom and 1 more