Skip to content

Bug with early stop and n_iter_no_change when training Model  #19451

@eduardo4jesus

Description

@eduardo4jesus

As far as I could search, I didn't find information on this bug. Not sure if I used the right keywords, but here we go...

Describe the bug

I am training a model. After 6 epochs the training is interrupted. According to the documentation the early stop is False by default. And, even if it were True the condition described for the argument tol is not fulfilled when analyzing the values for the loss over the epochs.

tolfloat, default=1e-3
The stopping criterion. If it is not None, training will stop when (loss > best_loss - tol) for n_iter_no_change consecutive epochs.

Steps/Code to Reproduce

  1. I am using this Dataset

raw-data.csv

T,A1,A2,A3
33.2,3.5,9.0,6.1
40.3,5.3,20.0,6.4
38.7,5.1,18.0,7.4
46.8,5.8,33.0,6.7
41.4,,31.0,7.5
37.5,6.0,13.0,5.9
39.0,6.8,25.0,6.0
40.7,5.5,30.0,
30.1,3.1,5.0,5.8
52.9,7.2,47.0,8.3
38.2,4.5,25.0,5.0
31.8,4.9,11.0,6.4
43.3,8.0,23.0,7.6
44.1,,35.0,7.0
42.8,6.6,39.0,5.0
33.6,3.7,21.0,4.4
34.2,6.2,7.0,5.5
48.0,7.0,40.0,7.0
38.0,4.0,35.0,6.0
35.9,4.5,23.0,3.5
40.4,5.9,33.0,4.9
36.8,5.6,27.0,4.3
45.2,4.8,,8.0
35.1,3.9,15.0,5.0
  1. Then, I open it with pandas and replace the NA values using the median.
df_raw = pd.read_csv('./raw-data.csv')
df_median = df_raw.apply(lambda x: x.fillna(x.median()), axis=0)
X, Y = df_median.drop(columns='T').to_numpy(), df_median['T'].to_numpy()
  1. Code use for fitting
myLM1 = SGDRegressor(verbose=1)
myLM1.fit(X, Y)

Output

-- Epoch 1
Norm: 324955830.41, NNZs: 3, Bias: 14194106.164223, T: 24, Avg. loss: 3973700531196889600.000000
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 16629771036.76, NNZs: 3, Bias: -617317208.989577, T: 48, Avg. loss: 181529035517448937275392.000000
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 70745129973.48, NNZs: 3, Bias: -5129135895.303034, T: 72, Avg. loss: 777346093047846122029056.000000
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 58661193605.90, NNZs: 3, Bias: -3424256200.533171, T: 96, Avg. loss: 551443499407620593156096.000000
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 21611084233.04, NNZs: 3, Bias: 222364194.025302, T: 120, Avg. loss: 523655130718998823436288.000000
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 14729666645.71, NNZs: 3, Bias: 1402055277.659604, T: 144, Avg. loss: 334121112525316814798848.000000
Total training time: 0.00 seconds.
Convergence after 6 epochs took 0.00 seconds
SGDRegressor(alpha=0.0001, average=False, early_stopping=False, epsilon=0.1,
             eta0=0.01, fit_intercept=True, l1_ratio=0.15,
             learning_rate='invscaling', loss='squared_loss', max_iter=1000,
             n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=None,
             shuffle=True, tol=0.001, validation_fraction=0.1, verbose=1,
             warm_start=False)
  1. Then, I noticed that by default n_iter_no_change=5. So, I created another model and tested the fitting function again.
myLM2 = SGDRegressor(verbose=1, n_iter_no_change=10)
myLM2.fit(X, Y)

Output

-- Epoch 1
Norm: 212889866.99, NNZs: 3, Bias: -12147107.482672, T: 24, Avg. loss: 10977951332091414528.000000
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 33877658284.48, NNZs: 3, Bias: 3221686871.858737, T: 48, Avg. loss: 218647246678094122057728.000000
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 44245299085.75, NNZs: 3, Bias: 8504092051.958626, T: 72, Avg. loss: 911501681159969846591488.000000
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 41027510933.49, NNZs: 3, Bias: 5572305635.743739, T: 96, Avg. loss: 869704005960650100572160.000000
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 65326809398.75, NNZs: 3, Bias: 748500853.498515, T: 120, Avg. loss: 559005468692885343830016.000000
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 19445131953.06, NNZs: 3, Bias: -3017287484.596134, T: 144, Avg. loss: 640531710195116690898944.000000
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 11535209562.04, NNZs: 3, Bias: -4257020952.531857, T: 168, Avg. loss: 273798336110092244484096.000000
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 9203881559.93, NNZs: 3, Bias: -2973999377.325749, T: 192, Avg. loss: 290728270869741598408704.000000
Total training time: 0.01 seconds.
-- Epoch 9
Norm: 10890181027.35, NNZs: 3, Bias: -2807586298.020673, T: 216, Avg. loss: 14822313322512440623104.000000
Total training time: 0.01 seconds.
-- Epoch 10
Norm: 10738156993.98, NNZs: 3, Bias: -2251398418.644948, T: 240, Avg. loss: 6075130266787779706880.000000
Total training time: 0.01 seconds.
-- Epoch 11
Norm: 5586073729.92, NNZs: 3, Bias: -2490357631.977556, T: 264, Avg. loss: 26258836820902312673280.000000
Total training time: 0.01 seconds.
Convergence after 11 epochs took 0.01 seconds
SGDRegressor(alpha=0.0001, average=False, early_stopping=False, epsilon=0.1,
             eta0=0.01, fit_intercept=True, l1_ratio=0.15,
             learning_rate='invscaling', loss='squared_loss', max_iter=1000,
             n_iter_no_change=10, penalty='l2', power_t=0.25, random_state=None,
             shuffle=True, tol=0.001, validation_fraction=0.1, verbose=1,
             warm_start=False)
  1. Now, the notation for the numbers above makes hard to read them. So, I am using a decorator to plot a chart.
myLM1 = SGDRegressor(verbose=1)
with DisplayLossCurve():
  myLM1.fit(X, Y)

Output

=============== Loss Array ===============
[2.95406140e+14 1.14313907e+23 1.23391908e+24 7.43942473e+23
 6.30549394e+22 3.81444250e+23]

download

As you can see, there is no way that condition for the early stop is true.

  1. Running again the example for model 2
myLM2 = SGDRegressor(verbose=1, n_iter_no_change=10)
with DisplayLossCurve():
  myLM2.fit(X, Y)

Output

=============== Loss Array ===============
[3.54981262e+18 1.08446422e+24 3.64640361e+23 6.35668752e+23
 4.05704140e+23 5.51679293e+23 2.84870055e+23 2.05759404e+23
 2.27129487e+23 2.32333059e+22 3.40118957e+21]

download-1


I just noticed that when using a pipeline, the model works nicely.

pipeline = make_pipeline(StandardScaler(), SGDRegressor(verbose=1))
with DisplayLossCurve():
  pipeline.fit(X, Y)

Output

=============== Loss Array ===============
[698.789267 549.568875 455.764301 385.315963 329.898958 284.905019
 247.724132 216.647281 190.31711  167.868168 148.596037 131.948425
 117.494491 104.881235  93.84543   84.139548  75.593668  68.045718
  61.354433  55.428558  50.143342  45.435389  41.225132  37.463083
  34.092248  31.066473  28.348297  25.904551  23.700976  21.710897
  19.915563  18.292876  16.823909  15.495913  14.289865  13.195086
  12.199626  11.295545  10.472725   9.723792   9.042427   8.420886
   7.853342   7.334551   6.861378   6.428062   6.032399   5.671314
   5.339777   5.035661   4.757      4.501251   4.266443   4.051114
   3.85367    3.67168    3.504323   3.350221   3.208313   3.077853
   2.957661   2.847504   2.7454     2.651231   2.564548   2.484364
   2.410306   2.342217   2.279268   2.220999   2.167082   2.117059
   2.070652   2.027802   1.988273   1.951822   1.917857   1.886282
   1.857099   1.829877   1.804733   1.781529   1.760066   1.739988
   1.721407   1.704004   1.687848   1.67277    1.658884   1.646049
   1.634011   1.622827   1.612362   1.602604   1.593531   1.585142
   1.577364   1.570052   1.563283   1.556888   1.550974   1.545459
   1.540315   1.535562   1.53106    1.526853   1.522848   1.51919
   1.51574    1.5125     1.509578   1.506747   1.50413    1.501607
   1.499332   1.497184   1.495179   1.493289   1.491581   1.489939
   1.488343   1.486849   1.485497   1.484214   1.483014   1.48191
   1.480837   1.479798   1.478842   1.47799    1.47716    1.476378
   1.475653]

download-2

Am I missing some initialization step when training without the assistance of a pipeline? Pretty much the only difference is the normalization.

  • I do understand how beneficial is the normalization for the model. But the behaviour described previously should not happen simply based on the value of the data.
  • I noticed the values are exploding. I wonder if an overflow happened causing the whole situation on the comparison done by the early stop. But still, early stop is set to False by default.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions