-
-
Notifications
You must be signed in to change notification settings - Fork 26k
Description
As far as I could search, I didn't find information on this bug. Not sure if I used the right keywords, but here we go...
Describe the bug
I am training a model. After 6 epochs the training is interrupted. According to the documentation the early stop is False
by default. And, even if it were True
the condition described for the argument tol
is not fulfilled when analyzing the values for the loss over the epochs.
tolfloat, default=1e-3
The stopping criterion. If it is not None, training will stop when (loss > best_loss - tol) for n_iter_no_change consecutive epochs.
Steps/Code to Reproduce
- I am using this Dataset
raw-data.csv
T,A1,A2,A3
33.2,3.5,9.0,6.1
40.3,5.3,20.0,6.4
38.7,5.1,18.0,7.4
46.8,5.8,33.0,6.7
41.4,,31.0,7.5
37.5,6.0,13.0,5.9
39.0,6.8,25.0,6.0
40.7,5.5,30.0,
30.1,3.1,5.0,5.8
52.9,7.2,47.0,8.3
38.2,4.5,25.0,5.0
31.8,4.9,11.0,6.4
43.3,8.0,23.0,7.6
44.1,,35.0,7.0
42.8,6.6,39.0,5.0
33.6,3.7,21.0,4.4
34.2,6.2,7.0,5.5
48.0,7.0,40.0,7.0
38.0,4.0,35.0,6.0
35.9,4.5,23.0,3.5
40.4,5.9,33.0,4.9
36.8,5.6,27.0,4.3
45.2,4.8,,8.0
35.1,3.9,15.0,5.0
- Then, I open it with pandas and replace the NA values using the median.
df_raw = pd.read_csv('./raw-data.csv')
df_median = df_raw.apply(lambda x: x.fillna(x.median()), axis=0)
X, Y = df_median.drop(columns='T').to_numpy(), df_median['T'].to_numpy()
- Code use for fitting
myLM1 = SGDRegressor(verbose=1)
myLM1.fit(X, Y)
Output
-- Epoch 1
Norm: 324955830.41, NNZs: 3, Bias: 14194106.164223, T: 24, Avg. loss: 3973700531196889600.000000
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 16629771036.76, NNZs: 3, Bias: -617317208.989577, T: 48, Avg. loss: 181529035517448937275392.000000
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 70745129973.48, NNZs: 3, Bias: -5129135895.303034, T: 72, Avg. loss: 777346093047846122029056.000000
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 58661193605.90, NNZs: 3, Bias: -3424256200.533171, T: 96, Avg. loss: 551443499407620593156096.000000
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 21611084233.04, NNZs: 3, Bias: 222364194.025302, T: 120, Avg. loss: 523655130718998823436288.000000
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 14729666645.71, NNZs: 3, Bias: 1402055277.659604, T: 144, Avg. loss: 334121112525316814798848.000000
Total training time: 0.00 seconds.
Convergence after 6 epochs took 0.00 seconds
SGDRegressor(alpha=0.0001, average=False, early_stopping=False, epsilon=0.1,
eta0=0.01, fit_intercept=True, l1_ratio=0.15,
learning_rate='invscaling', loss='squared_loss', max_iter=1000,
n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=None,
shuffle=True, tol=0.001, validation_fraction=0.1, verbose=1,
warm_start=False)
- Then, I noticed that by default
n_iter_no_change=5
. So, I created another model and tested the fitting function again.
myLM2 = SGDRegressor(verbose=1, n_iter_no_change=10)
myLM2.fit(X, Y)
Output
-- Epoch 1
Norm: 212889866.99, NNZs: 3, Bias: -12147107.482672, T: 24, Avg. loss: 10977951332091414528.000000
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 33877658284.48, NNZs: 3, Bias: 3221686871.858737, T: 48, Avg. loss: 218647246678094122057728.000000
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 44245299085.75, NNZs: 3, Bias: 8504092051.958626, T: 72, Avg. loss: 911501681159969846591488.000000
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 41027510933.49, NNZs: 3, Bias: 5572305635.743739, T: 96, Avg. loss: 869704005960650100572160.000000
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 65326809398.75, NNZs: 3, Bias: 748500853.498515, T: 120, Avg. loss: 559005468692885343830016.000000
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 19445131953.06, NNZs: 3, Bias: -3017287484.596134, T: 144, Avg. loss: 640531710195116690898944.000000
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 11535209562.04, NNZs: 3, Bias: -4257020952.531857, T: 168, Avg. loss: 273798336110092244484096.000000
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 9203881559.93, NNZs: 3, Bias: -2973999377.325749, T: 192, Avg. loss: 290728270869741598408704.000000
Total training time: 0.01 seconds.
-- Epoch 9
Norm: 10890181027.35, NNZs: 3, Bias: -2807586298.020673, T: 216, Avg. loss: 14822313322512440623104.000000
Total training time: 0.01 seconds.
-- Epoch 10
Norm: 10738156993.98, NNZs: 3, Bias: -2251398418.644948, T: 240, Avg. loss: 6075130266787779706880.000000
Total training time: 0.01 seconds.
-- Epoch 11
Norm: 5586073729.92, NNZs: 3, Bias: -2490357631.977556, T: 264, Avg. loss: 26258836820902312673280.000000
Total training time: 0.01 seconds.
Convergence after 11 epochs took 0.01 seconds
SGDRegressor(alpha=0.0001, average=False, early_stopping=False, epsilon=0.1,
eta0=0.01, fit_intercept=True, l1_ratio=0.15,
learning_rate='invscaling', loss='squared_loss', max_iter=1000,
n_iter_no_change=10, penalty='l2', power_t=0.25, random_state=None,
shuffle=True, tol=0.001, validation_fraction=0.1, verbose=1,
warm_start=False)
- Now, the notation for the numbers above makes hard to read them. So, I am using a decorator to plot a chart.
myLM1 = SGDRegressor(verbose=1)
with DisplayLossCurve():
myLM1.fit(X, Y)
Output
=============== Loss Array ===============
[2.95406140e+14 1.14313907e+23 1.23391908e+24 7.43942473e+23
6.30549394e+22 3.81444250e+23]
As you can see, there is no way that condition for the early stop is true.
- Running again the example for model 2
myLM2 = SGDRegressor(verbose=1, n_iter_no_change=10)
with DisplayLossCurve():
myLM2.fit(X, Y)
Output
=============== Loss Array ===============
[3.54981262e+18 1.08446422e+24 3.64640361e+23 6.35668752e+23
4.05704140e+23 5.51679293e+23 2.84870055e+23 2.05759404e+23
2.27129487e+23 2.32333059e+22 3.40118957e+21]
I just noticed that when using a pipeline, the model works nicely.
pipeline = make_pipeline(StandardScaler(), SGDRegressor(verbose=1))
with DisplayLossCurve():
pipeline.fit(X, Y)
Output
=============== Loss Array ===============
[698.789267 549.568875 455.764301 385.315963 329.898958 284.905019
247.724132 216.647281 190.31711 167.868168 148.596037 131.948425
117.494491 104.881235 93.84543 84.139548 75.593668 68.045718
61.354433 55.428558 50.143342 45.435389 41.225132 37.463083
34.092248 31.066473 28.348297 25.904551 23.700976 21.710897
19.915563 18.292876 16.823909 15.495913 14.289865 13.195086
12.199626 11.295545 10.472725 9.723792 9.042427 8.420886
7.853342 7.334551 6.861378 6.428062 6.032399 5.671314
5.339777 5.035661 4.757 4.501251 4.266443 4.051114
3.85367 3.67168 3.504323 3.350221 3.208313 3.077853
2.957661 2.847504 2.7454 2.651231 2.564548 2.484364
2.410306 2.342217 2.279268 2.220999 2.167082 2.117059
2.070652 2.027802 1.988273 1.951822 1.917857 1.886282
1.857099 1.829877 1.804733 1.781529 1.760066 1.739988
1.721407 1.704004 1.687848 1.67277 1.658884 1.646049
1.634011 1.622827 1.612362 1.602604 1.593531 1.585142
1.577364 1.570052 1.563283 1.556888 1.550974 1.545459
1.540315 1.535562 1.53106 1.526853 1.522848 1.51919
1.51574 1.5125 1.509578 1.506747 1.50413 1.501607
1.499332 1.497184 1.495179 1.493289 1.491581 1.489939
1.488343 1.486849 1.485497 1.484214 1.483014 1.48191
1.480837 1.479798 1.478842 1.47799 1.47716 1.476378
1.475653]
Am I missing some initialization step when training without the assistance of a pipeline? Pretty much the only difference is the normalization.
- I do understand how beneficial is the normalization for the model. But the behaviour described previously should not happen simply based on the value of the data.
- I noticed the values are exploding. I wonder if an overflow happened causing the whole situation on the comparison done by the early stop. But still, early stop is set to
False
by default.